NCRVE Home | Site Search | Product Search

<< >> Up Title Contents NCRVE Home

APPENDIX A-5
THE RANGE OF ASSESSMENT STRATEGIES
Brian Stecher, RAND


During the past few years, concerns about the U.S. economy have prompted heated debates about vocational education and employment training programs. Experts have called for a variety of reforms as a result: changes in curriculum content, in teaching methods, and in methods of assessment. Reports on the inadequate skills of high school graduates, the rapidly changing demands of many employers, and the declining competitiveness of U.S. firms in the international marketplace have also stimulated a variety of proposals to change the organization and structure of employment preparation programs. Despite different approaches to reform, all sides seem to agree on the need for valid, reliable, and affordable methods for assessing students' skills.

This section is designed to help you choose the best assessment method(s) for your particular needs, by moving through the following steps:

Different types of assessment strategies are used for different purposes; your purposes should determine how you measure knowledge and skills.

For example:

One form of assessment is not necessarily better than another in all contexts. Rather, one form might be more appropriate than another, depending on your purpose, the knowledge and skills to be assessed, and quality and feasibility considerations.

The Purposes of the Assessment

Measuring Individual Student Learning

Assessments designed to measure student learning in a course or program often become an integral component of the curriculum. Assessments with this purpose can be administered on-demand: they can be given often and graded quickly, and performance can be assessed cumulatively over longer periods of time.

Many teachers use assessments to measure individual student learning and progress. Some traditional strategies for this purpose include pop quizzes, end-of-chapter exams, and in-class essay exams. Alternative forms include portfolios, oral presentations, and senior projects. Assessments with this purpose can provide a teacher with insight into students' academic and technical progress and, if used over time, an impression of students' cognitive development as well. Assessments designed to measure student learning can also provide teachers with information about which instructional approaches work best, and where changes need to be made or additional attention focused.

Certification of Individual Students

An assessment used to certify student mastery of occupationally oriented material provides an effective means of signaling to employers that a student has a particular set of skills and knowledge. Since students generally seek employment (or a higher-level job) in an occupational area after being certified, employers frequently give input into the design and implementation of the system. In a true employee certification system, employers require certification for hiring and career advancement. In some industries, such as health, students cannot apply for particular jobs without the certification.

Assessments used to certify students may document general abilities or accumulated knowledge (e.g., tests for college admission) or specific ones (e.g., tests for professional licensing). Individuals who pass tests of specific, job-oriented skills often receive a certificate that can be used statewide, or even nationally.

Program Performance Information

Assessment is also used to provide information on the quality of the programs, schools, and districts that are providing education and training. This information may be used for monitoring progress in making program improvements, maintaining quality, making comparisons across different programs, or holding administrators of programs accountable. Trend data for programs, schools, or districts can also be used to monitor the performance of these entities over time.

Assessments used to provide program performance information often aggregate scores from individual performances to describe the achievement of particular groups (e.g., a graduating class or a whole school, or males and females separately). Assessments with this purpose are often administered infrequently, for example, annually or even less often.

Individual Exercise

Directions: List assessments that are used for one (or more) of the above purposes in your district or school, along with the purposes. What problems do you think you might encounter by attempting to accomplish more than one purpose with the same assessment?

Types of Assessment

Written Assessment

Multiple-Choice and Open-Ended Written Items

This type of test is highly efficient for testing a student's knowledge of specific facts or skills and includes multiple-choice, true-false, or fill-in-the-blank questions. There are a limited number of predetermined "right" answers (sometimes only one); the student selects or produces a limited response to a stimulus or prompt. Open-ended questions are distinguished from the others in this category by the student having to generate a response rather than choosing from among ones presented. It costs less (in time and funds) to develop and administer a multiple-choice or open-ended item test than other forms of assessment.

Essays and Problem- or Scenario-Based Items

These three types of written assessment require students to demonstrate their knowledge of a topic in writing, including a short answer or explanation or a long essay response. The stimulus can be printed material but it can also be an object, an event, or an experience. Written assessments include essays in response to a question, and problem- or scenario-based responses. This type of question may challenge students to think about issues and problems related to the industry that they are studying and often challenges students to integrate their knowledge of several disciplines. Usually scoring is more complex and time-consuming than for multiple-choice or open-ended assessments.

Performance Task

This type of assessment may consist of one or a set of multiple physical tasks such as changing the oil in a car engine or drafting a floor plan for a building. Performance tasks can be designed to test a student's specific abilities in a skill area, or his or her decision-making or problem-solving skills, or some combination of this type of skill. Performance tasks can be structured with one evaluator using a checklist of items and scoring criteria or by dividing observation among several different evaluators using a common scoring rubric.

Senior Project

Senior projects include at least three discrete activities to measure a student's achievement over the course of the senior year of high school (though the concept can be adapted for a different year or time period): (1) a research paper, (2) a project (the product is usually not written and is often an artifact of some kind: a videotape, a performance, or a physical model, for example), and (3) an oral presentation. Each component has its own criteria for evaluation. Evaluators are trained to properly assess a student's senior project. The time and cost required to evaluate senior projects is relatively higher than when a single written assessment or performance task is used.

Portfolio

A portfolio is a collection of student work covering multiple outcomes and activities. It is usually implemented with a focus on one of three purposes: (1) to improve curriculum and instruction, (2) to help students get jobs and to improve employability skills, or (3) to challenge students to take an active role in setting and meeting goals and in shaping their own tasks. The portfolio might represent work samples collected over one or more years. In some instances, a portfolio might include results from a standardized test or another written assessment instrument.

Individual/Group Exercise

Directions: Read the following scenarios and answer the questions that follow:

(1) The assistant superintendent in Greenwood School District is asked to design a program improvement system for the district. The performance of each school not only will be reported to the community, but will also be used to award bonuses to teachers at the best schools.

What types of assessments would you use? Why?

(2) Within a particular state, biotechnology firms are interested in hiring large numbers of skilled workers over the next few years. A group of CEOs contacts the community college system to design a certification system for two job categories that require an associate's degree.

What types of assessments would you recommend? Why?

(3) Teachers at Lifton High School have been disappointed in the scores of their juniors and seniors on statewide standardized tests. They don't think the scores adequately reflect students' achievement or abilities. These same teachers are also aware that many students are disengaged from the high school curriculum by junior year, which may be contributing to declining scores and low attendance rates. These teachers would like to design assessment strategies that both engage the students and help demonstrate their achievement.

What types of assessments would you recommend? Why?

Quality

To ensure that an assessment strategy will provide accurate information, the technical quality of the measures should be considered. Three aspects of assessment quality are of special concern:

  1. Reliability: How accurate is the information?
  2. Validity: Does the assessment measure what it is intended to measure?
  3. Fairness: Is the assessment free of biases against any group of students?

Reliability

There are no perfect measuring tools, either in science, in the kitchen, or in education, so people who use tools to measure things need to know how much error there is likely to be in the information they receive. When we talk about the reliability of an assessment measure we mean to what degree the score on the measure (or on the test as a whole) is accurate. If, for example, a student took the same test again, would she or he get the same score? If students took a comparable test would a similar result be obtained?

On commercial tests, the reliability is usually around .80, which is considered high. This means that 80% of the test score reflects "true" performance and 20% reflects measurement error. High reliability comes partly from the fact that commercial tests obtain lots of separate bits of information about what students know; for example, students might answer 30 multiple-choice questions per half-hour.

Achieving high reliability with alternative assessment methods is more difficult. The usually longer and more complex responses supply fewer pieces of information about performance. Example: A teacher evaluating a portfolio is likely to review only a handful of student products, giving limited evidence. Moreover, judgment is required for scoring, inevitably bringing in a certain amount of subjective opinion. Subjective judgments may be reflected in inconsistencies between raters. Interrater reliability asks, Would two raters score the assessment the same way? Would the same rater, repeating the scoring session at a different time, score the assessment the same?

In selecting an assessment measure, we need to consider whether it would give the same result if repeated, how well its scores correlate with those of other assessments measuring comparable knowledge, and consistency across raters.

Validity

The validity of an assessment tells us whether it is measuring what we think it's measuring. If we want to know how well a student can write, a multiple-choice test of spelling and grammar may not be a valid indicator or how successfully he can write an essay himself, though it may indicate how well he can identify errors of this type in text.

There are several ways to establish or measure a test's validity. A panel of experts in the field can review the contents of the measure; performance on the test measure can be compared with actual performance on similar tasks in, for example, a work setting; or we can study the pattern of responses among several tasks measuring the same thing.

One of the primary motivations for adopting alternative assessments is to increase validity by making the assessment tasks more like the real-world activities the tests are designed to simulate. Because alternative assessments pose more "authentic" tasks, it is hoped that the assessment scores will more accurately reflect students' ability and knowledge in a given area.

One problem with interpreting the results of some types of alternative assessments, such as senior projects or portfolios, is that they are inherently nonstandardized. The content of each individual's submissions will be different, and the resources available to the student may vary. It is difficult to assign scores fairly to such different products, including taking into consideration factors like access to resources (such as computers or access to experts).

Fairness

If students who otherwise have equal ability score differently on an assessment because of background knowledge or experience that is irrelevant to the assessed skill/knowledge, then the measure is unfair or "biased." Example: A task that assumes the student is familiar with different snow conditions may be biased against students who live in a climate where it never snows.

The fairness of an assessment is usually established by expert committees trained to analyze factors that might disadvantage or benefit particular groups of students. Many advocates believe that alternative assessments are more equitable to all groups because they assign more complete tasks and permit students to address the tasks in ways that are meaningful to them. However, all vocational educators selecting and constructing assessments need to be sensitive to the diverse backgrounds of their students.

Individual/Group Exercise

Directions: By yourself or in a group, answer the following questions:

(1) Think of a test being used in your state or school but not in your classroom. What do you know about its reliability, validity, and fairness? What would you have to do to find out?
(2) Think of a test being used in your classroom. What do you know about its reliability, validity, and fairness? What would you have to do to find out?

Feasibility

Practical issues of cost and time required to administer and score, complexity, and acceptability are legitimate concerns in selecting from among alternative assessments. It should come as no surprise that selected-response tests are the most efficient users of time and budgets.

Cost

In general, alternative assessments are more expensive to develop, administer, and score than selected-response tests. Scoring is the greatest added expense of using alternative assessments. Multiple-choice tests can be scored quickly for only pennies per student. Because of their complexity, alternative assessments are time-consuming to score. Essays, for example, can cost several dollars per student to score. Often there are additional costs in training the people who will do the scoring.

Offsetting the additional costs of alternative assessments may be two benefits: (1) substantial staff development and (2) greater test validity. Teachers report that scoring alternative assessments improves their understanding of student learning, including their misconceptions and problems, and it is useful for instructional planning. Alternative assessments are also likely to provide more valid information about students' abilities to perform occupationally relevant tasks.

Time

Alternative assessments place greater time demands on administrators, teachers, and students. Alternative assessments frequently require more class time to administer (which may cut into instructional time), and certainly require more time for scoring, which may reduce teacher planning time. On the positive side, teachers learn more about student performance by scoring this type of task. Moreover, when assessments are closely linked to classroom instructional activities, such as senior projects and portfolios, the distinction between assessment time and learning time is blurred, and the time problem may be less troublesome.

Complexity

Alternative assessments are usually more complex than traditional tests. Students respond to more complicated test questions or situations that may be covering a broad range of course content; the method students use for responding is more elaborate; students may use manipulatives and may produce objects/artifacts in response to tasks; higher-order thinking skills are often required; and the scoring procedures are more complicated.

Making the arrangements to conduct an alternative assessment is also more complicated than passing out pencils and paper for a multiple-choice test. Training may be necessary to learn to administer or score alternative assessments (and sometimes to develop them as well), and additional equipment and facilities may be needed.

Acceptability

People familiar with traditional types of tests may be reluctant to implement alternative assessments or to accept alternative assessment results as credible. If the measures fail to meet reasonable technical standards or to address accepted curricular material, they may in fact be less credible. On the other hand, one of the advantages of alternative assessments is that employers and other stakeholders may give greater credibility to scores based on authentic performance tasks than to traditional test results.

Exercise

Directions: Using the tests you thought of in the last section, answer the following questions:

(1) What do you know about the cost, complexity, and acceptability of the test used in your state or school?
(2) What do you know about the cost, complexity, and acceptability of the test used in your classroom?


<< >> Up Title Contents NCRVE Home
NCRVE Home | Site Search | Product Search