RESPONSIBILITIES OF TEST USERS
(RUST STATEMENT) 1989
(AACD/AMECD POLICY STATEMENT)
Copyright, 1989, by the American Counseling Association.  All Rights Reserved.
(American Counseling Association/Association for Assessment in Counseling) 1998

 

I.  Introduction
BACKGROUND:

At the 1976 AACD (then APGA) Convention, the Board of Directors requested the development of a statement on the responsible use of standardized tests to promote proper test use, reflecting the advantages of assessment along with the concerns about negative effects, and to help its members employ safeguards against misuse of tests.  A committee representing all AACD Divisions and Regions spent two years studying the issues and developed a statement, published in the October, 1978, issue of Guidepost, titled "Responsibilities of Users of Standardized Tests".  The Association for Measurement and Evaluation in Counseling and Development was charged with maintaining ongoing review of the so-called RUST Statement.  The present statement has grown out of that review.

TARGET AUDIENCEThe statement is intended to address the needs of the members of AACD and its Divisions, Branches, and Regions, including counselors and other human service workers.  Although it may interest test developers, teachers, administrators, parents, the press, or the general public, it is not specifically designed for these audiences.

ORGANIZATION AND FOCUS:  The statement is organized into eight sections:  Introduction, Test Decisions, Qualifications of Test Users, Test Selection, Test Administration, Test Scoring, Test Interpretation, and Communicating Test Results.  Basic to the statement is the assumption that test data are merely numbers and that guidelines can help promote their constructive use.  The statement specifies general principles and activities which constitute responsible practice.  These are grouped among similar issues and are indexed for ease of reference.
 
 

II.  Test Decisions:

Decisions should be based on data.  In general, test data improve the quality of decisions.  However, deciding whether or not to test creates the possibility of three kinds of errors.  First, a decision not to test can result in misjudgments that stem from inadequate or subjective data.  Second, tests may produce data which could improve accuracy in decisions affecting the client, but which are not used in counseling.  Third, tests my be misused.  The responsible practitioner will determine, in advance, the purpose for administering a given test, considering protections and benefits for the client, practitioner, and agency.

A.  Define purposes for testing by developing specific objectives and limits for the use
      of test data in relation to the particular assessment purpose:

    1.  Placement:  If the purpose is selection or placement, the test user should understand
         the programs or institutions into which the client may be placed and be able to judge
         the consequences of inclusion or exclusion decisions for the client.

    2.  Prediction:  If the purpose is prediction, the test user should understand the need for
         predictive data as well as possible negative consequences (e.g., stereotyping).

    3.  Description:  If the purpose is diagnosis or description, the test user should understand
         the general domain being measured and be able to identify those aspects which are
        adequately measured and those which are not.

    4.  Growth:  If the purpose is to examine growth or change, the test user should understand
         the practical and theoretical difficulties associated with such measurement.

    5.  Program Evaluation:  If the purpose of assessment is the evaluation of an agency's
         programs, the test user should be aware of the various information needs for the
         evaluation and of the limitations of each instrument used to assess those needs,
         as well as how the evaluation will be used.
 

B.  Determine Information Needs and Assessment Needs:

    1.  Determine whether testing is intended to assess individuals, groups, or both.

    2.  Identify the particular individual and/or group to be tested with regard to the agency's
         purposes and capabilities.

    3.  Determine the limitations to testing created by an individual's age, racial, sexual,
         ethnic, and cultural background; or other characteristics.

    4.  Avoid unnecessary testing by identifying decisions which can be made with existing
         information.

    5.  Assess the consequences for clients of deciding either to test or not to test.

    6.  Limit data gathering to the variables that are needed for the particular purpose.

    7.  Cross validate test data using other available information whenever possible.
 

III. Qualification of Test Users:

While all professional counselors and personnel workers should have formal training in psychological and educational measurement and testing, this training does not necessarily make one an expert and even an expert does not have all the knowledge and skills appropriate to some particular situations or instruments.  Questions of user qualifications should always be addressed when testing is being considered.

Lack of proper qualifications can lead to errors and subsequent harm to clients.  Each professional is responsible for making judgments on this in each situation and cannot leave that responsibility either to clients or to others in authority.  It is incumbent upon the individual test user to obtain appropriate training or arrange for proper supervision and assistance when engaged in testing.  Qualifications for test users depend on four factors:

A.  Purposes of Testing:  Technically proper testing for ill understood purposes may constitute
      misuse.  Because the purposes of testing dictate how the results are used, qualifications of
      test users are needed beyond general testing competencies to interpret & apply data.

B.  Characteristics of Tests:  Understanding the nature and limitations of each instrument
      used is needed by test users.

C.  Settings & Conditions of Test Use:  Assessment of the quality and relevance of test user
      knowledge and skill to the situation is needed before deciding to test or to participate
      in a testing program.

D.  Roles of Test Selectors, Administrators, Scorers, & Interpreters:  Test users must be
      engaged in only those testing activities for which their training and experience
      qualify them.

IV.  Test Selection
 

The selection of tests should be guided by information obtained from a careful analysis of the characteristics of the population to be tested;  the knowledge, skills, abilities, or attitudes to be assessed;  the purposes for testing;  and the eventual use and interpretation of the test scores.  Use of tests should also be guided by criteria for technical quality recommended by measurement professionals (i.e., the APA/AERA/NCME "Standards for Educational and Psychological Tests" and the APA/AERA/NCME/AACD/ASHA "Code of Fair Testing Practices in Education").

A.  Relate Validity to Usage:

    1.  Determine the validity of a test (whether the test measures what is meant to be measured)
         through evidence of the constructs used in developing the test, the correlation of the test
         performance with other appraisals of the characteristics being measured, and/or the
         predictions of specified behaviors from the test performance.

    2.  Determine whether a test is congruent with the users' definition of the characteristics of
         human performance to be appraised.

    3.  Use tests for selection purposes only when they show predictive validity for the specific
         tasks or competencies needed in an educational or employment experience and when they
         maintain legal and ethical prescriptions for non-discriminatory practices in program
         selection, employment, or placement.

B.  Use Appropriate Tests:

    1.  Document tests are appropriate for the characteristics of the population to be tested.

    2.  Only use tests within the level of skills of administration and interpretation possessed
         by the practitioner.

    3.  Use tests consistent with local needs:
          a.  Give attention to how the test is designed to handle variation of motivation, working
               speed, language facility, and experimental background among persons taking it; bias
               in response to its content; and effects of guessing in response to its questions.
          b.  Determine whether a common test or different tests are required for accurate
               measurement of groups with special characteristics.
               i.   Recognize that the use of different tests for cultural, ethnic and racial groups may
                    constitute ineffective means for making corrections for differences.
               ii.  Determine whether persons or groups that use different languages should be tested
                     in either or both languages and in some instances, tested first for bilingualism
                     or language dominance.

C.  Consider Technical Characteristics:

    1.  Select only tests that have documented evidence of reliability or consistency.

    2.  Select only tests that have adequate documented evidence of the effectiveness of the
         measure for the purpose to be served and justification of the interferences based
         on the results.

    3.  Scrutinize standardization and norming procedures for relevance to the local population
         and use of the data.

    4.  Use separate norms for men & women or other subgroups when empirical evidence
         indicates they are appropriate.

    5.  Determine the degree of technical quality demanded of a test on the basis of nature of
         the decisions to be made.

    6.  Include ease and accuracy of the procedures for scoring, summarizing, and communicating
         test performance among the criteria for selecting a test.

    7.  Consider practical constraints of costs, conditions, and time for testing as a secondary
          test selection criteria.

D.  Employ User Participation in Test Selection:  Actively involve everyone who will be using
      the assessments (administering, scoring, summarizing, interpreting, making decisions) as
      appropriate in the selection of tests so that they are congruent with local purposes,
      conditions, and uses.
 

V. Test Administration:

Test administration includes procedures to ensure that the test is used in the manner specified by the test developers and that the individuals being tested are working within conditions which maximize opportunity for optimum, comparable performance.

A.  Provide Proper Orientation:

    1.  Inform testing candidates, parents, and institutions or agencies in the community as
         appropriate about testing procedures.

    2.  Provide persons being tested sufficient practice exercises prior to the test.

    3.  Prior to testing, check all takers' ability to record their responses adequately (e.g., in
         the use of machine-scorable answer sheets).

    4.  Provide periodic training by qualified personnel for test administrators within agencies
         or institutions using tests.

    5.  Review test materials and administration sites and procedures prior to the time for
         testing to ensure standardized conditions and appropriate response to any irregularities
         which may occur.

B.  Use Qualified Test Administrators:

    1.  Acquire any training required to administer the test.

    2.  Ensure that individuals taking self-administered or self-scored instruments have the
         necessary understanding and competencies.

C.  Provide Appropriate Testing Conditions:

    1.  Ensure that the testing environment (seating, work surfaces, lighting, heating, freedom
         from distractions, etc.) and psychological climate is conducive to the best possible
         performance of the test-takers.

    2.  Carefully observe, record, and attach to the test record any deviation from prescribed
         test administration procedure.

    3.  Use a systematic and objective procedure for observing and recording environmental,
         health, or emotional factors, or other elements which may validate test performance.
         This record should be attached to the test scores of the person tested.

    4.  Use sufficiently trained personnel to provide uniform conditions and to observe the
         conduct of the examinees when large groups of individuals are tested.

D.  Give Proper Directions:

    1.  Present each test in the manner prescribed in the test manual to ensure that it is fair to
         each test taker.

    2.  Administer standardized tests with the verbatim instructions, exact sequence & timing,
         and identical materials that were used in the test standardization.

    3.  Demonstrate verbal clarity, calmness, empathy for the examinees, and impartiality toward
         all being tested.  Because taking a test may be a new and frightening experience or
         stimulate anxiety or frustration for some individuals, the examinees should attempt each
         task with positive application of their skills and knowledge and the expectation that they
         will do their best.

E.  Coordinate Professional Collaboration:  In settings where skill and knowledge is pooled and
      responsibility shared, consider the qualifications of the testing team as a whole as more
      important than those of individuals.  However, coordination and consistency of
      responsibilities with expertise must be maintained.

VI.  Test Scoring:

Accurate measurement of human performance necessities adequate procedures for scoring the responses of examinees.  These procedures must be audited as necessary to ensure consistency and accuracy of application.

A.  Consider Accuracy and Interpretability:  Select a test scoring process that maximizes
      accuracy and interpretability.

B.  Rescore Samples:  Routinely resource samples of examinee responses to monitor the
     accuracy of the scoring process.

C.  Screen Test Results:  Screen reports of test results using personnel competent to recognize
      unreasonable or impossible scores.

D.  Verify Scores and Norms:  Verify the accuracy of computation of raw scores and conversion
      to normative scales prior to release of such information to examinees or users of test results.

E.  Communicate Deviations:  Report as part of the official record any deviation from normal
      conditions and examinee behaviors.

F.  Label Results:  Clearly label the date of test administration along with the scores.
 

VII.  Test Interpretation

Test interpretation encompasses all the ways that meaning is assigned to the scores.  Proper interpretation requires knowledge about the test which can be obtained by studying its manual and other materials along with current research literature with respect to its use;  no one should undertake the interpretation of scores on any test without such study.

A.  Consider Reliability:  Reliability is important because it is a prerequisite to validity and
      because the degree to which a score may vary due to measurement error is an important
      factor in its interpretation.

    1.  Estimate test stability using reliability (or other appropriate) coefficient.

    2.  Use the standard error of measurement to estimate the amount of variation due to random
         error in individual scores and to evaluate the precision of cut-scores in selection
         decisions.

    3.  Consider, in relationship to the uses being made of the scores, variance components
         attributed to error in the reliability index.

    4.  Evaluate reliability estimates with regard to factors that may have artificially raised or
         lowered them (e.g., test speededness, biases in population sampling).

    5.  Distinguish indices of objectivity (i.e., scorer reliability) from test reliability.

B.  Consider Validity:  Proper test interpretation requires knowledge of the validity evidence
      available for the intended use of the test.  Its validity for other uses is not relevant.  Indeed,
      use of a measurement for a purpose for which it was not designed may constitute misuse.
      The nature of the validity evidence required for a test depends upon its use.

    1.  Use for Placement:  Predictive validity is the usual basis for valid placement.

        a.  Obtain adequate information about the programs or institutions  in which the client may
             be placed to judge the consequences of placement.

        b.  Use all available evidence to infer the validity of an individual's score.  A single test
             score should not be the sole basis for a placement or selection recommendation.  Other
             items of information about an individual (e.g., teacher report, counselor opinion)
             frequently improve the likelihood that proper judgments and decisions will be made.

        c.  Consider validity for each alternative (i.e., each placement option) when interpreting
             test scores and other evidence.

        d.  Examine the possibility that a client's group membership (socioeconomic status, gender,
             subculture, etc.) may effect test performance and, consequently, validity.

        e.  Estimate the probability of favorable outcomes for each possible placement before
             making recommendations.

        f.  Consider the possibility that outcomes favorable from an institutional point of view
            may differ from those that are favorable from the individual's point of view.

    2.  Use for Prediction:  The relationship of the test scores to an independently developed
         criterion measure is the basis for predictive validity.

        a.  Consider the reliability and validity of the criterion measure(s) used.

        b.  Consider the validity of a measure in the context of other predictors available (i.e., does
             the test make a valid contribution to prediction beyond that provided by other
             measures).

        c.  Use cross validation to judge the validity of prediction processes.

        d.  Consider the effects of labeling, stereotyping, and prejudging people (e.g., self-
             fulfilling prophecies that may result from labeling are usually undesirable).

        e.  If a statistically valid predictor lacks both construct and content validity, analyze the
            mechanism by which it operates to determine whether or not its predictive validity is
            spurious.

    3.  Use for Description:  Comprehensiveness of information is fundamental to effective
         description, since no set of test scores completely describes an individual.

        a.  Clearly identify the domain assessed by any measure and the adequacy of the content
             sampling procedures used in developing items.

        b.  Clarify the dimensions being measured when multiple scores from a battery or inventory
             are used for description.

            i.  Examine the content and/or construct validity of each score separately.

           ii.  Consider the relative importance of each of the separate elements for interpretation.

          iii.  Give appropriate weight to reflect the variabilities (e.g., standard deviations) and
                 relationships (e.g., correlations) of scores which are to be combined.

        c.  Distinguish characteristics that can be validated only empirically and those for which
             content specifications exist.

    4.  Use for Assessment of Growth:  Assessment of growth or change requires valid tests as
          well as a valid procedure for combining them.

        a.  Specifically evaluate the reliability of differences between scores as measures change.

        b.  Establish the validities of the measures used to establish change in relation to one
             another as well as individually.

        c.  Consider comparability of intervals in scales used to access change.

            i.  Evaluate derived or extrapolated scores (e.g., grade equivalents) for possible
                different score levels.

           ii.  Consider problems in interpretation and comparability of tests (e.g. floor or ceiling
                 effects, content changes from level to level, poor articulation in multilevel tests,
                 lack of comparability of alternate forms, inadequacy of score-equating across forms,
                 and differences in administration and timing of tests from that of their norming).

        d.  Assess potential for undesirable correlations of difference scores with the measures
             entering into their calculations (e.g., regression toward the mean).

        e.  Recognize the potential lack of comparability between norms for differences derived
             from norms and norms for differences derived from differences (i.e. mathematically
             derived norms for differences are not necessarily equivalent to norms based on
             distributions of actual differences).

    5.  Use for Program Evaluation:  Assessments of group differences (between groups or
         within groups over time) are based on research designs which to varying degrees admit
         competing interpretations of results.

        a.  Use procedures in the evaluation which ensure that no factors other than those being
             studied have major influence on the results ( i.e., internal validity).

        b.  Use statistical procedures which are appropriate and have all assumptions met by the
             data being analyzed.

        c.  Evaluate the generalizability (external validity) of the results for different individuals,
             settings, tests, and variables.

C.  Scores, Norms, and Related technical Features:  The result of scoring a test or subtest is
      usually a number called a raw score which by itself is not interpretable.  Additional steps
      are needed to translate the number directly into either a verbal description (e.g., pass or
      fail) or into a derived score (e.g., a standard score).  Less than full understanding of these
      procedures is likely to produce errors in interpretation and ultimately in counseling or
      other uses.

    1.  Examine appropriate test material (e.g., manuals, handbooks, user's guides, and technical
         reports) to identify the descriptions or derived scores produced and their unique
        characteristics.

        a.  Know the operational procedures for translating raw scores into descriptions or
             derived scores.

        b.  Know specific psychological or educational concepts or theories before interpreting
             the scores of tests based on them.

        c.  Consider differential validity along with equating error when different tests,
             different test forms, or scores on the same test administered at different times are
             compared.

    2.  Clarify arbitrary standards used in interpretation (e.g., mastery or nonmastery for
         criterion-referenced tests).

        a.  Recognize that when a score is interpreted based on a proportion score (e.g., percent
             correct), its elements are being given arbitrary weights.

        b.  Recognize that the difficulty of a fixed standard varies widely and thus does not have
             the same meaning for different content areas and for different assessment methods.

        c.  Report the number (or percentage) of items right in addition to the interpretation when
             it will help others understand the quality of the examinee's performance.

    3.  Employ derived scores based on norms which fit the needs of the current use of the test.

        a.  Evaluate whether available norm groups are appropriate as part of the process of
             interpreting the scores of the clients.

            i.  Use norms for the group to which the client belongs.

           ii.  Recognize that derived scores based on different norm groups may not be
                comparable.

          iii.  Use local norms and derived scores based on them whenever possible.

        b.  Choose a score based on its intended use.

            i.  Consider relative standing scores for comparison of individuals to the norm or
                 reference group.

           ii.  Consider standard or scaled scores whenever means and variances or other
                 arithmetic operations are appropriate.

          iii.  When using a statistical technique, use the test's derived score which best meets the
                 assumptions of the analysis.

D.  Administration and Scoring Variation:  Stated criteria for score interpretation assume
      standard procedures for administering and scoring the test.  Departures from standard
      conditions and procedures modify and often invalidate these criteria.

    1.  Evaluate unusual circumstances peculiar to the administration and scoring of the test.

        a.  Examine reports from administrators, proctors, and scorers concerning irregularities
             or unusual conditions for possible effects on test performance.

        b.  Consider potential effects of examiner-examinee differences in ethnic and cultural
             background, attitudes, and values based on available relevant research.

        c.  Consider any reports of examinee behavior indicating that responses were made on
             some basis other than that intended.

        d.  Consider differences among clients in their reaction to instructions about guessing
             and scoring.

    2.  Evaluate scoring irregularities and bias and judgment effects when subjective elements
         enter into scoring.
 

VIII.  Communicating Test Results:

The responsible counselor or other practitioner reports test data with a concern for the individual's need for information and the purposes of the information.  There must also be protection of the right of the person tested to be informed about how the results will be used and what safeguards exist to prevent misuse (right to information) and about who will have access to the results.

A.  Decisions About Individuals:  Where test data are used to enhance decisions about an
      individual, the practitioner's responsibilities include:

    1.  Limitations on Communication:

        a.  Inform the examinee of possible actions that may be taken by any person or agency who
             will be using the results.

        b.  Limit access to users specifically authorized by the law or by the client.

        c.  Obtain the consent of the examinee before using test results for any purpose other than
             those advanced prior to testing.

    2.  Practitioner Communication Skills:

        a.  Develop the ability to interpret test results accurately before attempting to communicate
             them.

        b.  Develop appropriate communication skills, particularly with respect to concepts that
             are commonly misunderstood by the intended audience, before attempting  to explain
             test results to clients, the public, or other recipients of the information.

    3.  Communication of Limitations of the Assessment:

        a.  Inform persons receiving test information that scores are not perfectly accurate and
             indicate the degree of inaccuracy in some way, such as reporting score intervals.

        b.  Inform persons receiving test information of any circumstances that could have effected
             the validity or reliability of the results.

        c.  Inform persons receiving test information of any factors necessary to understand
             potential sources of bias for a given test result.

        d.  Communicate clearly that test data represent just one source of information and should
             rarely, if ever, be used alone for decision making.

    4.  Communication of Clients Rights:

        a.  Provide test takers or their parents/guardians with information about any rights they
             may have to obtain test copies and/or their completed answer sheets, to retake test, to
             have tests rescored, or to cancel test scores.

B.  Decisions about Groups:  Where standardized test data are being used to describe groups
      for the purpose of evaluation, the practitioner's responsibilities include:

    1.  Background Information:

        a.  Identify the purposes for which the reported data are appropriate.

        b.  Include additional information if it can improve accuracy of understanding.

    2.  Averages and Norms:

        a.  Clarify the amount of meaning that can be attached to differences between groups
             (e.g., statistical significance should not be taken as a judgment of importance).

        b.  Qualify norms based on their appropriateness for the group being tested.

    3.  Use obsolescence schedules so that stored data are systematically relocated to historical
         files or destroyed.

    4.  Process data used for research or program evaluation to assure individual anonymity (e.g.,
         released only in aggregated form).

    5.  Political Usage:

        a.  Emphasize that test data should be used only for the test's stated purposes.

        b.  Public release of test information provides data for many purposes.  Take steps to
             minimize those which may be adverse to the interests of those tested.

    6.  Agency Policies:

        a.  Advocate agency test-reporting policies designed to benefit the groups being measured.

        b.  Advocate the establishment of procedures for periodic review of test use.

IX.  Extensions of These Principles:

This statement is intended to address current and emerging problems and concerns that are generic to all AACD divisions, branches, and regions by formulating principles that are specific enough to serve as a template for more closely focused statements addressed to specific situations.  Individual divisions, branches, and regions are encouraged to elaborate upon this statement to reflect principles, procedures, and examples appropriate to their members.
 
 

 

__________________________

This revision of the 1978 RUST Statement was prepared by a standing committee of AMECD chaired by William D. Schafer.  Participating in the revision were Esther E. Diamond, Charles G. Eberly, Patricia B. Elmore, Jo-Ida C. Hansen, William A. Mehrens, Jane E. Myers, Larry Rawlins, and Alan G. Robertson.

Additional copies of RUST Statement may be obtained from the American Association for Counseling and Development, 5999 Stevenson Avenue, Alexandria, VA 22304.  Single copies are free.

1998 Addendum:  This statement is reproduced for educational purposes only.  Go to the American Counseling Association homesite for reprint and subscription information:  http://www.counseling.org/