At the 1976 AACD (then APGA) Convention, the Board of Directors requested the development of a statement on the responsible use of standardized tests to promote proper test use, reflecting the advantages of assessment along with the concerns about negative effects, and to help its members employ safeguards against misuse of tests. A committee representing all AACD Divisions and Regions spent two years studying the issues and developed a statement, published in the October, 1978, issue of Guidepost, titled "Responsibilities of Users of Standardized Tests". The Association for Measurement and Evaluation in Counseling and Development was charged with maintaining ongoing review of the so-called RUST Statement. The present statement has grown out of that review.
TARGET AUDIENCE: The statement is intended to address the needs of the members of AACD and its Divisions, Branches, and Regions, including counselors and other human service workers. Although it may interest test developers, teachers, administrators, parents, the press, or the general public, it is not specifically designed for these audiences.
ORGANIZATION AND FOCUS: The statement is organized
into eight sections: Introduction, Test Decisions, Qualifications
of Test Users, Test Selection, Test Administration, Test Scoring, Test
Interpretation, and Communicating Test Results. Basic to the statement
is the assumption that test data are merely numbers and that guidelines
can help promote their constructive use. The statement specifies
general principles and activities which constitute responsible practice.
These are grouped among similar issues and are indexed for ease of reference.
II. Test Decisions:
Decisions should be based on data. In general, test data improve the quality of decisions. However, deciding whether or not to test creates the possibility of three kinds of errors. First, a decision not to test can result in misjudgments that stem from inadequate or subjective data. Second, tests may produce data which could improve accuracy in decisions affecting the client, but which are not used in counseling. Third, tests my be misused. The responsible practitioner will determine, in advance, the purpose for administering a given test, considering protections and benefits for the client, practitioner, and agency.
A. Define purposes for testing by developing
specific objectives and limits for the use
of test data in relation to the particular assessment purpose:
1. Placement: If
the purpose is selection or placement, the test user should understand
the programs or institutions into which the client may be placed and be able to judge
the consequences of inclusion or exclusion decisions for the client.
If the purpose is prediction, the test user should understand the need
predictive data as well as possible negative consequences (e.g., stereotyping).
If the purpose is diagnosis or description, the test user should understand
the general domain being measured and be able to identify those aspects which are
adequately measured and those which are not.
4. Growth: If the
purpose is to examine growth or change, the test user should understand
the practical and theoretical difficulties associated with such measurement.
5. Program Evaluation:
If the purpose of assessment is the evaluation of an agency's
programs, the test user should be aware of the various information needs for the
evaluation and of the limitations of each instrument used to assess those needs,
as well as how the evaluation will be used.
B. Determine Information Needs and Assessment Needs:
1. Determine whether testing is intended to assess individuals, groups, or both.
2. Identify the particular
individual and/or group to be tested with regard to the agency's
purposes and capabilities.
3. Determine the limitations
to testing created by an individual's age, racial, sexual,
ethnic, and cultural background; or other characteristics.
4. Avoid unnecessary
testing by identifying decisions which can be made with existing
5. Assess the consequences for clients of deciding either to test or not to test.
6. Limit data gathering to the variables that are needed for the particular purpose.
7. Cross validate test
data using other available information whenever possible.
III. Qualification of Test Users:
While all professional counselors and personnel workers should have formal training in psychological and educational measurement and testing, this training does not necessarily make one an expert and even an expert does not have all the knowledge and skills appropriate to some particular situations or instruments. Questions of user qualifications should always be addressed when testing is being considered.
Lack of proper qualifications can lead to errors and subsequent harm to clients. Each professional is responsible for making judgments on this in each situation and cannot leave that responsibility either to clients or to others in authority. It is incumbent upon the individual test user to obtain appropriate training or arrange for proper supervision and assistance when engaged in testing. Qualifications for test users depend on four factors:
A. Purposes of Testing: Technically proper testing for ill
understood purposes may constitute
misuse. Because the purposes of testing dictate how the results are used, qualifications of
test users are needed beyond general testing competencies to interpret & apply data.
B. Characteristics of Tests: Understanding the nature and
limitations of each instrument
used is needed by test users.
C. Settings & Conditions of Test Use: Assessment of
the quality and relevance of test user
knowledge and skill to the situation is needed before deciding to test or to participate
in a testing program.
D. Roles of Test Selectors, Administrators, Scorers, & Interpreters:
Test users must be
engaged in only those testing activities for which their training and experience
IV. Test Selection
The selection of tests should be guided by information obtained from a careful analysis of the characteristics of the population to be tested; the knowledge, skills, abilities, or attitudes to be assessed; the purposes for testing; and the eventual use and interpretation of the test scores. Use of tests should also be guided by criteria for technical quality recommended by measurement professionals (i.e., the APA/AERA/NCME "Standards for Educational and Psychological Tests" and the APA/AERA/NCME/AACD/ASHA "Code of Fair Testing Practices in Education").
A. Relate Validity to Usage:
1. Determine the validity of a test (whether
the test measures what is meant to be measured)
through evidence of the constructs used in developing the test, the correlation of the test
performance with other appraisals of the characteristics being measured, and/or the
predictions of specified behaviors from the test performance.
2. Determine whether a test is congruent with
the users' definition of the characteristics of
human performance to be appraised.
3. Use tests for selection purposes only when
they show predictive validity for the specific
tasks or competencies needed in an educational or employment experience and when they
maintain legal and ethical prescriptions for non-discriminatory practices in program
selection, employment, or placement.
B. Use Appropriate Tests:
1. Document tests are appropriate for the characteristics of the population to be tested.
2. Only use tests within the level of skills
of administration and interpretation possessed
by the practitioner.
3. Use tests consistent with local needs:
a. Give attention to how the test is designed to handle variation of motivation, working
speed, language facility, and experimental background among persons taking it; bias
in response to its content; and effects of guessing in response to its questions.
b. Determine whether a common test or different tests are required for accurate
measurement of groups with special characteristics.
i. Recognize that the use of different tests for cultural, ethnic and racial groups may
constitute ineffective means for making corrections for differences.
ii. Determine whether persons or groups that use different languages should be tested
in either or both languages and in some instances, tested first for bilingualism
or language dominance.
C. Consider Technical Characteristics:
1. Select only tests that have documented evidence of reliability or consistency.
2. Select only tests that have adequate documented
evidence of the effectiveness of the
measure for the purpose to be served and justification of the interferences based
on the results.
3. Scrutinize standardization and norming procedures
for relevance to the local population
and use of the data.
4. Use separate norms for men & women or
other subgroups when empirical evidence
indicates they are appropriate.
5. Determine the degree of technical quality
demanded of a test on the basis of nature of
the decisions to be made.
6. Include ease and accuracy of the procedures
for scoring, summarizing, and communicating
test performance among the criteria for selecting a test.
7. Consider practical constraints of costs,
conditions, and time for testing as a secondary
test selection criteria.
D. Employ User Participation in Test Selection: Actively
involve everyone who will be using
the assessments (administering, scoring, summarizing, interpreting, making decisions) as
appropriate in the selection of tests so that they are congruent with local purposes,
conditions, and uses.
V. Test Administration:
Test administration includes procedures to ensure that the test is used in the manner specified by the test developers and that the individuals being tested are working within conditions which maximize opportunity for optimum, comparable performance.
A. Provide Proper Orientation:
1. Inform testing candidates,
parents, and institutions or agencies in the community as
appropriate about testing procedures.
2. Provide persons being tested sufficient practice exercises prior to the test.
3. Prior to testing,
check all takers' ability to record their responses adequately (e.g., in
the use of machine-scorable answer sheets).
4. Provide periodic training
by qualified personnel for test administrators within agencies
or institutions using tests.
5. Review test materials
and administration sites and procedures prior to the time for
testing to ensure standardized conditions and appropriate response to any irregularities
which may occur.
B. Use Qualified Test Administrators:
1. Acquire any training required to administer the test.
2. Ensure that individuals
taking self-administered or self-scored instruments have the
necessary understanding and competencies.
C. Provide Appropriate Testing Conditions:
1. Ensure that the testing
environment (seating, work surfaces, lighting, heating, freedom
from distractions, etc.) and psychological climate is conducive to the best possible
performance of the test-takers.
2. Carefully observe,
record, and attach to the test record any deviation from prescribed
test administration procedure.
3. Use a systematic and
objective procedure for observing and recording environmental,
health, or emotional factors, or other elements which may validate test performance.
This record should be attached to the test scores of the person tested.
4. Use sufficiently trained
personnel to provide uniform conditions and to observe the
conduct of the examinees when large groups of individuals are tested.
D. Give Proper Directions:
1. Present each test
in the manner prescribed in the test manual to ensure that it is fair to
each test taker.
2. Administer standardized
tests with the verbatim instructions, exact sequence & timing,
and identical materials that were used in the test standardization.
3. Demonstrate verbal
clarity, calmness, empathy for the examinees, and impartiality toward
all being tested. Because taking a test may be a new and frightening experience or
stimulate anxiety or frustration for some individuals, the examinees should attempt each
task with positive application of their skills and knowledge and the expectation that they
will do their best.
E. Coordinate Professional Collaboration:
In settings where skill and knowledge is pooled and
responsibility shared, consider the qualifications of the testing team as a whole as more
important than those of individuals. However, coordination and consistency of
responsibilities with expertise must be maintained.
VI. Test Scoring:
Accurate measurement of human performance necessities adequate procedures for scoring the responses of examinees. These procedures must be audited as necessary to ensure consistency and accuracy of application.
A. Consider Accuracy and Interpretability: Select a test
scoring process that maximizes
accuracy and interpretability.
B. Rescore Samples: Routinely resource samples of examinee
responses to monitor the
accuracy of the scoring process.
C. Screen Test Results: Screen reports of test results using
personnel competent to recognize
unreasonable or impossible scores.
D. Verify Scores and Norms: Verify the accuracy of computation
of raw scores and conversion
to normative scales prior to release of such information to examinees or users of test results.
E. Communicate Deviations: Report as part of the official
record any deviation from normal
conditions and examinee behaviors.
F. Label Results: Clearly label the date of test administration
along with the scores.
VII. Test Interpretation
Test interpretation encompasses all the ways that meaning is assigned to the scores. Proper interpretation requires knowledge about the test which can be obtained by studying its manual and other materials along with current research literature with respect to its use; no one should undertake the interpretation of scores on any test without such study.
A. Consider Reliability: Reliability is important because
it is a prerequisite to validity and
because the degree to which a score may vary due to measurement error is an important
factor in its interpretation.
1. Estimate test stability using reliability (or other appropriate) coefficient.
2. Use the standard error of measurement to
estimate the amount of variation due to random
error in individual scores and to evaluate the precision of cut-scores in selection
3. Consider, in relationship to the uses being
made of the scores, variance components
attributed to error in the reliability index.
4. Evaluate reliability estimates with regard
to factors that may have artificially raised or
lowered them (e.g., test speededness, biases in population sampling).
5. Distinguish indices of objectivity (i.e., scorer reliability) from test reliability.
B. Consider Validity: Proper test interpretation requires
knowledge of the validity evidence
available for the intended use of the test. Its validity for other uses is not relevant. Indeed,
use of a measurement for a purpose for which it was not designed may constitute misuse.
The nature of the validity evidence required for a test depends upon its use.
1. Use for Placement: Predictive validity is the usual basis for valid placement.
a. Obtain adequate
information about the programs or institutions in which the client
be placed to judge the consequences of placement.
b. Use all available
evidence to infer the validity of an individual's score. A single
score should not be the sole basis for a placement or selection recommendation. Other
items of information about an individual (e.g., teacher report, counselor opinion)
frequently improve the likelihood that proper judgments and decisions will be made.
c. Consider validity
for each alternative (i.e., each placement option) when interpreting
test scores and other evidence.
d. Examine the possibility
that a client's group membership (socioeconomic status, gender,
subculture, etc.) may effect test performance and, consequently, validity.
e. Estimate the probability
of favorable outcomes for each possible placement before
f. Consider the possibility
that outcomes favorable from an institutional point of view
may differ from those that are favorable from the individual's point of view.
2. Use for Prediction: The relationship
of the test scores to an independently developed
criterion measure is the basis for predictive validity.
a. Consider the reliability and validity of the criterion measure(s) used.
b. Consider the validity
of a measure in the context of other predictors available (i.e., does
the test make a valid contribution to prediction beyond that provided by other
c. Use cross validation to judge the validity of prediction processes.
d. Consider the effects
of labeling, stereotyping, and prejudging people (e.g., self-
fulfilling prophecies that may result from labeling are usually undesirable).
e. If a statistically
valid predictor lacks both construct and content validity, analyze the
mechanism by which it operates to determine whether or not its predictive validity is
3. Use for Description: Comprehensiveness
of information is fundamental to effective
description, since no set of test scores completely describes an individual.
a. Clearly identify
the domain assessed by any measure and the adequacy of the content
sampling procedures used in developing items.
b. Clarify the dimensions
being measured when multiple scores from a battery or inventory
are used for description.
i. Examine the content and/or construct validity of each score separately.
ii. Consider the relative importance of each of the separate elements for interpretation.
appropriate weight to reflect the variabilities (e.g., standard deviations)
relationships (e.g., correlations) of scores which are to be combined.
c. Distinguish characteristics
that can be validated only empirically and those for which
content specifications exist.
4. Use for Assessment of Growth: Assessment
of growth or change requires valid tests as
well as a valid procedure for combining them.
a. Specifically evaluate the reliability of differences between scores as measures change.
b. Establish the validities
of the measures used to establish change in relation to one
another as well as individually.
c. Consider comparability of intervals in scales used to access change.
Evaluate derived or extrapolated scores (e.g., grade equivalents) for possible
different score levels.
Consider problems in interpretation and comparability of tests (e.g. floor
effects, content changes from level to level, poor articulation in multilevel tests,
lack of comparability of alternate forms, inadequacy of score-equating across forms,
and differences in administration and timing of tests from that of their norming).
d. Assess potential
for undesirable correlations of difference scores with the measures
entering into their calculations (e.g., regression toward the mean).
e. Recognize the potential
lack of comparability between norms for differences derived
from norms and norms for differences derived from differences (i.e. mathematically
derived norms for differences are not necessarily equivalent to norms based on
distributions of actual differences).
5. Use for Program Evaluation: Assessments
of group differences (between groups or
within groups over time) are based on research designs which to varying degrees admit
competing interpretations of results.
a. Use procedures in
the evaluation which ensure that no factors other than those being
studied have major influence on the results ( i.e., internal validity).
b. Use statistical
procedures which are appropriate and have all assumptions met by the
data being analyzed.
c. Evaluate the generalizability
(external validity) of the results for different individuals,
settings, tests, and variables.
C. Scores, Norms, and Related technical Features: The result
of scoring a test or subtest is
usually a number called a raw score which by itself is not interpretable. Additional steps
are needed to translate the number directly into either a verbal description (e.g., pass or
fail) or into a derived score (e.g., a standard score). Less than full understanding of these
procedures is likely to produce errors in interpretation and ultimately in counseling or
1. Examine appropriate test material (e.g.,
manuals, handbooks, user's guides, and technical
reports) to identify the descriptions or derived scores produced and their unique
a. Know the operational
procedures for translating raw scores into descriptions or
b. Know specific psychological
or educational concepts or theories before interpreting
the scores of tests based on them.
c. Consider differential
validity along with equating error when different tests,
different test forms, or scores on the same test administered at different times are
2. Clarify arbitrary standards used in interpretation
(e.g., mastery or nonmastery for
a. Recognize that when
a score is interpreted based on a proportion score (e.g., percent
correct), its elements are being given arbitrary weights.
b. Recognize that the
difficulty of a fixed standard varies widely and thus does not have
the same meaning for different content areas and for different assessment methods.
c. Report the number
(or percentage) of items right in addition to the interpretation when
it will help others understand the quality of the examinee's performance.
3. Employ derived scores based on norms which fit the needs of the current use of the test.
a. Evaluate whether
available norm groups are appropriate as part of the process of
interpreting the scores of the clients.
i. Use norms for the group to which the client belongs.
Recognize that derived scores based on different norm groups may not be
iii. Use local norms and derived scores based on them whenever possible.
b. Choose a score based on its intended use.
Consider relative standing scores for comparison of individuals to the
Consider standard or scaled scores whenever means and variances or other
arithmetic operations are appropriate.
using a statistical technique, use the test's derived score which best
assumptions of the analysis.
D. Administration and Scoring Variation: Stated criteria
for score interpretation assume
standard procedures for administering and scoring the test. Departures from standard
conditions and procedures modify and often invalidate these criteria.
1. Evaluate unusual circumstances peculiar to the administration and scoring of the test.
a. Examine reports
from administrators, proctors, and scorers concerning irregularities
or unusual conditions for possible effects on test performance.
b. Consider potential
effects of examiner-examinee differences in ethnic and cultural
background, attitudes, and values based on available relevant research.
c. Consider any reports
of examinee behavior indicating that responses were made on
some basis other than that intended.
d. Consider differences
among clients in their reaction to instructions about guessing
2. Evaluate scoring irregularities and bias
and judgment effects when subjective elements
enter into scoring.
VIII. Communicating Test Results:
The responsible counselor or other practitioner reports test data with a concern for the individual's need for information and the purposes of the information. There must also be protection of the right of the person tested to be informed about how the results will be used and what safeguards exist to prevent misuse (right to information) and about who will have access to the results.
A. Decisions About Individuals: Where
test data are used to enhance decisions about an
individual, the practitioner's responsibilities include:
1. Limitations on Communication:
Inform the examinee of possible actions that may be taken by any person
or agency who
will be using the results.
b. Limit access to users specifically authorized by the law or by the client.
Obtain the consent of the examinee before using test results for any purpose
those advanced prior to testing.
2. Practitioner Communication Skills:
Develop the ability to interpret test results accurately before attempting
Develop appropriate communication skills, particularly with respect to
are commonly misunderstood by the intended audience, before attempting to explain
test results to clients, the public, or other recipients of the information.
3. Communication of Limitations of the Assessment:
Inform persons receiving test information that scores are not perfectly
indicate the degree of inaccuracy in some way, such as reporting score intervals.
Inform persons receiving test information of any circumstances that could
the validity or reliability of the results.
Inform persons receiving test information
of any factors necessary to understand
potential sources of bias for a given test result.
Communicate clearly that test data represent just one source of information
rarely, if ever, be used alone for decision making.
4. Communication of Clients Rights:
Provide test takers or their parents/guardians with information about any
may have to obtain test copies and/or their completed answer sheets, to retake test, to
have tests rescored, or to cancel test scores.
B. Decisions about Groups: Where standardized
test data are being used to describe groups
for the purpose of evaluation, the practitioner's responsibilities include:
1. Background Information:
a. Identify the purposes for which the reported data are appropriate.
b. Include additional information if it can improve accuracy of understanding.
2. Averages and Norms:
Clarify the amount of meaning that can be attached to differences between
(e.g., statistical significance should not be taken as a judgment of importance).
b. Qualify norms based on their appropriateness for the group being tested.
3. Use obsolescence schedules
so that stored data are systematically relocated to historical
files or destroyed.
4. Process data used
for research or program evaluation to assure individual anonymity (e.g.,
released only in aggregated form).
5. Political Usage:
a. Emphasize that test data should be used only for the test's stated purposes.
Public release of test information provides data for many purposes.
Take steps to
minimize those which may be adverse to the interests of those tested.
6. Agency Policies:
a. Advocate agency test-reporting policies designed to benefit the groups being measured.
b. Advocate the establishment of procedures for periodic review of test use.
IX. Extensions of These Principles:
This statement is intended to address current
and emerging problems and concerns that are generic to all AACD divisions,
branches, and regions by formulating principles that are specific enough
to serve as a template for more closely focused statements addressed to
specific situations. Individual divisions, branches, and regions
are encouraged to elaborate upon this statement to reflect principles,
procedures, and examples appropriate to their members.
This revision of the 1978 RUST Statement was prepared by a standing committee of AMECD chaired by William D. Schafer. Participating in the revision were Esther E. Diamond, Charles G. Eberly, Patricia B. Elmore, Jo-Ida C. Hansen, William A. Mehrens, Jane E. Myers, Larry Rawlins, and Alan G. Robertson.
Additional copies of RUST Statement may be obtained from the American Association for Counseling and Development, 5999 Stevenson Avenue, Alexandria, VA 22304. Single copies are free.
1998 Addendum: This statement is reproduced
for educational purposes only. Go to the American Counseling Association
homesite for reprint and subscription information: http://www.counseling.org/