NCRVE Home | Site Search | Product Search

<< >> Up Title Contents NCRVE Home

IV.2 Recent Studies

In spite of the several difficulties listed above, a number of STW program evaluations have been carried out. This section summarizes some published after 1993. All of these evaluations contain considerably more results and methodological detail than can be presented here, and interested readers should consult them directly. Where appropriate, this section also comments on methodological aspects in the context of the criteria introduced above.

Wisconsin Youth Apprenticeship Program in Printing: Evaluation 1993-1995: M. Orr (1996)

This evaluation considers a youth apprenticeship program comprising five sites in Wisconsin. The program's design includes a competency-based curriculum and assessment system; required two-year (part-time) paid training and work experience at a printing company; a work-based mentor; technical college instruction in printing technology and some academic subjects; integrated vocational and academic instruction; and collaborative school and industry oversight. This evaluation carefully considers the effectiveness with which each of the sites implemented these aspects--an essential part in components-based outcome evaluation as described above.

In order to identify the program's effect on students, the evaluation created a comparison group of students in conventional vocational printing programs and regular classes. This is valid as long as these groups are similar, as they appear to be on a number of dimensions. Nevertheless, the report itself suggests potential problems with this control group selection, as it mentions that "the program does not serve very poorly performing students or those who are educationally at risk, primarily because of the perception that employers will not `hire' them . . ." (p. vii). Orr also reports that the youth apprenticeship program enrolls relatively few females and non-white students. These aspects are important as they bear on how effective the program can be expected to be if it is expanded to populations unlike the one currently receiving the treatment.

Given this comparison group, the evaluation does find significant positive effects stemming from the Youth Apprenticeship Program (YAP). The first set of goals it examines refer to participants' in-school performance, and it quantifies these by examining changes between the sophomore and senior years. Table 8 presents some of the resulting comparisons.

Table 8
School Performance of Sophomores and Seniors
in Youth Apprenticeship and Comparison Groups


Program Type
Number of
Days
Absent

  N  
Grade
Point
Average

  N  
Number of
Disciplinary
Referrals

  N  

Sophomore






YAP
4.7
19
2.6
27
1
19
Printing classes only
7.5
15
2.5
16
5
5
General Classes
6.4
31
3.1
31
0
31
Senior






YAP
4.8
29
3.0
18
0
29
Printing classes only
12.8
12
2.7
15
4
5
General Classes
12.6
31
3.1
31
1
31
Percentage Change






YAP
0.02
18
0.15
17
-1.00
6
Printing classes only
0.71
12
0.08
12
-0.20
4
General Classes
0.97
29
0.00
31
-
6

The first column indicates that absence rates during the sophomore year were similar across all three groups in the sense that they did not display statistically significant differences. These essentially did not change for the YAP students; however, the two comparison groups both had statistically significant increases, suggesting the apprenticeship experience was successful at keeping absences down. The YAP students also evolved more positively in terms of GPA, with a statistically significant increase. As Orr mentions, a potential weakness in these estimates stems from the small sample sizes due to the early stages of implementation and the program's size itself. Nevertheless, the results are promising.

Table 9 moves further ahead in participants' experiences, presenting a comparison of post-educational outcomes for YAP students and those remaining in printing and regular classes. The data comes from interviews carried out six to eight months after graduation.

Table 9
Labor Market Performance of Graduates from Youth Apprenticeship
and Comparison Groups (Percentages)



Program Type

  YAP  
  Co-op  
Printing Classes
Current Employment Status



Working
94
60
75
Working in Printing
94
60
13
Not working, not looking for a job
6
40
25
Portion of Time Employed Since Graduation



Only a little
0
20
38
Some of the time
0
0
13
Most of the time
0
20
0
All of the time
100
60
50
Continued working at work experience job
81
20
-
Continued working for apprenticeship employer
75
20
-
Would like a career in the printing industry
75
80
38
N =
16
5
8
Of Those Currently Working, Hours Worked per Week:



Less than 15 hours
0
0
17
15 to 30 hours
33
67
33
31 to 40 hours
27
0
0
More than 40 hours
40
33
50
Hourly Earnings



$4.25 to $4.99
7
0
17
$5.00 to $6.99
33
67
50
$7.00 to $9.99
60
33
17
$10.00 to $12.99
0
0
17
$13.00 or more
0
0
0
N =
15
3
6

The table is suggestive of better experiences for the YAP graduates, essentially across the board. A higher percentage of these students have been consistently employed, and more are in full time, higher paying jobs. A greater percentage continued working for their training or apprenticeship employer (as compared to co-op participants), and a higher proportion remained in the printing industry as well. These facts suggest participants may have benefited from larger amounts of firm- and industry-specific human capital investments, and have overall better labor market experiences.

The report complements this information with employers' comparisons of YAP graduates and other entry-level workers, with YAP garnering generally positive results. Additionally, it presents extensive evidence on YAP's impact on students' self-described performances and expectations.

In sum, this evaluation's strengths arise from the fact that it presents clear statements of YAP's goals and explicit rationales for control group construction. Its weaknesses arise from possible dissimilarities between treatment and control groups, small sample sizes, and little evidence on YAP's longer-term effects. Overall, the report does suggest that youth apprenticeship programs with some or all of the components listed make a positive contribution to students' performance. It also provides a foundation for future research once the program is extended and larger sample sizes become available.

An Evaluation of the Manufacturing Technology Partnership Program: Hollenbeck (1996)

This report presents a net impact evaluation of the Manufacturing Technology Partnership (MTP) program. The initiative behind this program came from a United Autoworkers (UAW) local at a GM Truck and Bus plant in Flint, Michigan. The program's stated aims are to attract female and minority students and to provide training to develop high school students for formal GM/UAW skill apprenticeships, which involve thousands of hours of work experience and rigorous formal education.

The program operates from an area vocational school that serves high schools in Genesee County. Students are transported to the school for a two- or three-hour block each day. Staff members include math, science, language, and physics teachers whose mission is to integrate these subjects into the applied curriculum. In the 11th grade, students are divided into three groups and rotated through three 12-week classes: Principles of Manufacturing, Electronic Industry, and Machining. In the second year, the class is split and rotated between computer-aided design and manufacturing (CAD/CAM). In addition, the majority of students work for pay at the truck and bus plant after school and in the summers. Paid employment is also available with other local businesses.

The program also potentially includes a postsecondary education phase. If students pass the skilled trades apprentice test at a level high enough to qualify for an apprenticeship, they receive full on-the-job training and company-paid coursework at Mott Community College. Funding for instruction at the Skill Center comes from the center's own budget. Funding for student wages in general comes from JTPA and other sources. In addition, GM trained seven full-time Truck and Bus plant mentors and supplemented the students' center instruction with pre-apprenticeship examination tutoring.

The evaluation focuses on net impacts for the classes of 1992 and 1993, selecting different comparison groups for each class. For 1992, students who applied to the program but were not selected and students who dropped out of the program before completing one year make up the comparison group. For 1993, nonparticipating and non-applying students from Genesee and an adjacent county were selected, with an attempt made to match them for comparability with the treatment students.[35]

The construction of these comparison groups (particularly those for the class of 1992) leaves open the possibility that significant biases are introduced. For instance, clearly students who did not gain admission or who had dropped out of the program are different from those who remained. To the extent that those who remained in the program are probably the more talented, the report may overestimate the program's success. A general strength of this report is that these possible biases are clearly stated and discussed.

Table 10 gives the sample sizes for the treatment and comparison group for each of the two classes considered.

Table 10
Study Sample Sizes


MTP
Comparison
Enrollment Baseline F-U1 F-U2 Transcripts Enrollment F-U1 F-U2 Transcripts
1992
32
32
20
15
16
41
30
25
40
1993
39
38
35
28
31
66
59
58
60

The treatment group was composed of 32 students in 1992 and 39 in 1993, and the comparison groups have sample sizes of 41 and 66, respectively. The table also shows sample sizes for four types of data from each group of students: baseline survey responses, responses to follow-up surveys conducted in fall 1994 (F-U1) and fall 1995 (F-U2), and transcript data.

As discussed above, comparison groups are best constructed when they are very similar to the treatment group. To see whether this is the case, Hollenbeck compares students' average high school GPAs and class standing, shown in Table 11 below.

Table 11
High School GPA and Class Standing


MTP
Comparison
GPA
Rank %
GPA
Rank %
1992
2.97
21
2.66*
42**
1993
3.13
29
3.32*
23

Standing is a rank percentile (from top).
* Difference statistically significant at the .10 level
** Difference statistically significant at the .05 level

The fact that the GPA and class rank levels for the 1992 treatment group are significantly higher than those for the control group reflects that the latter was constructed from students who were either rejected or who had dropped out of the program. This suggests that simply comparing the outcomes of treatment and control groups for this year may yield biased estimates. For the class of 1993, the two groups' GPAs are marginally statistically different, but their class rank is not.

Some net impact estimates are presented in the following two tables. Table 12 provides information on the evolution of students' average number of absences. For the class of 1993, the table includes information on students' absences in the year before they joined MTP.[36]

Table 12
Average High School Absences (Transcript Data)


MTP Comparison
1992 11th grade:  2.40
12th grade:  3.47
11th grade:  6.05**
12th grade:  8.59***
1993 Pre-MTP:  5.95
11th grade:  2.56
12th grade:  5.53
Pre-MTP:  5.37
11th grade:  6.52***
12th grade:  8.08*
* Difference statistically significant at the .10 level
** Difference statistically significant at the .05 level
*** Difference statistically significant at the .01 level

Focusing first on the class of 1992, one can see that while the MTP participants consistently had lower absence rates than nonparticipants, the increase in the rates of both groups was about the same, so that it is hard to argue that the MTP program had positive effects in this realm. In the case of the class of 1993, additional information on students pre-MTP absence rates is available. In this case, the MTP group actually experienced a decline in absence rates while those for the control group increased, suggesting the treatment had a positive impact on participants. As the author mentions, these positive results in attendance indicators may be due to presence of high-quality paid-employment opportunities in this program. To the extent that it is difficult to find many such opportunities, the program's success may be hard to replicate on a large scale.

Moving on to post-schooling outcomes, Table 13 compares the percentages of students from each group attending college at different points in time. The class of 1992 treatment group displays significantly higher college attendance rates than the control group. The fact that this is not the case for the class of 1993 may reflect that the programs postsecondary component was less emphasized in later years.

Table 13
Percent in College Attendance


MTP
Comparison
1992
Fall `93: 100.0%
Fall `94: 100.0
Fall `95: 46.7
70.8%*
69.0**
48.0
1993
Fall `95: 89.3 91.4
Standing is a rank percentile (from top).
* Difference statistically significant at the .05 level
** Difference statistically significant at the .01 level

Finally, Table 14 presents these groups' performances on some other post-schooling dimensions: employment rate, average wage, and average hours worked. The treatment groups display better performance in a number of dimensions, with statistically significant advantages in percentage employment in the fall of 1993, and in average wages and hours worked in the fall of 1995.

Table 14
Employment Experiences


%
Employed
Avg.
Wage
Avg.
Hours
%
Employed
Avg.
Wage
Avg.
Hours
1992
Fall `93:
100.0
6.25
12.0
(MTP)
61.0***
5.01
19.0

18.8
4.66
12.0
(Other)



Fall `94:
61.9
5.68
29.0
65.5
5.35
32.2
Fall `95:
80.0
9.79
39.9
72.0
5.55***
31.9
1993
Fall `93:
-
-
-
34.3
4.44
13.0
Fall `94:
100.0
6.25
12.0
(MTP)




31.4
5.11
14.6
(Other)
58.1**
4.59
16.1
Fall `95:
64.3
5.81
31.2
60.3
5.20
25.2*
Averages do not include zeros.
* Difference statistically significant at the .10 level
** Difference statistically significant at the .05 level
*** Difference statistically significant at the .01 level

In sum, this evaluation is suggestive of positive effects arising from a STW program, particularly when the focus is on the class of 1993 and its more credibly constructed comparison group. At least part of this program's success, however, may derive from generous company funding and the availability of high-paid union jobs--factors which would be difficult to replicate in much larger scales than observed here.

The Evolution of a Youth Apprenticeship Model: A Second Year Evaluation of Boston's ProTech: Kopp, Goldberger, and Morales (1994)

This evaluation centers on implementation aspects, but outcomes information is included as well. ProTech is one of the best-known prototypes of the new youth apprenticeship model, in which high school juniors and seniors are grouped into two or three courses and participate in rotations and a part-time job at local hospitals. This placement becomes the core of the program in the senior year, and after graduation, students have the opportunity to enroll in a range of postsecondary programs in health, while continuing to receive hospital-based training.

From interview evidence, the authors conclude the program has a significant impact on students' self-esteem, awareness of job opportunities, and understanding of the relationship between good skills and well-paying jobs.[37] Nevertheless, this has not translated into improved school performance. As Table 15 shows, the second cohort of participants suffered a slight decline in their GPA and attendance rates after entering the program.

Table 15
Change in Grade Point Average of Second Cohort of ProTech
Students after One Year in the Program

School GPA Attendance Rate
Sample Size
1991-1992 1992-1993 Change in GPA Sample Size 1991-1992 1992-1993 Change in Attendance
Brighton 27
2.2
2.2
0
28
95%
93%
-2%
English
25
2.5
2.4
-0.1
25
93%
90%
-3%
Boston
17
2.5
2.2
-0.3
18
90%
88%
-2%
All Students 69
2.4
2.3
-0.1
71
93%
91%
-2%

The authors mention that in a regression analysis of GPA (results are not reported), ProTech participation was not significant after controlling for GPA in the year prior to enrollment in the program.

The lack of improvement in GPA is somewhat tempered by the fact that ProTech encouraged students to take more difficult courses than they otherwise would have. Particularly in the areas of math and science, for instance, all ProTech students continued in grade-level appropriate math, whereas only 81% of those not in the program did. Finally, the evaluation reports an improvement in the program's retention rate from 62% to 74% with respect to the first cohort.

The Effects of Magnet Education on High Schools and Their Graduates: Crain et al. (1997)[38]

As discussed earlier, evaluations are more definitive when they can be carried out under conditions of random assignment, which is often difficult. There are, however, instances where an experimental situation may arise not because researchers explicitly design it, but, rather, because some policy creates a situation in which people were randomly selected to receive some treatment. Such "natural experiments" provide valuable opportunities for research (Meyer, 1995).

Thaler and Crain (1996) analyze such a situation in New York City, which has established academic career magnet programs either as schools-within-schools in comprehensive high schools or as totally separate schools called total academic career magnets, of which eight exist. These programs generally stress careers like pre-law, business, and computer science.

The natural experiment arises from a selection mechanism that operates as follows. Each program is required to accept students from three different groups according to their seventh grade reading-level scores. One sixth of magnet students come from the group with reading scores in the top sixth of the distribution; one sixth come from the bottom sixth in reading ability, and the rest (two-thirds) come from the remainder group of average reading ability. Additionally, since 1987, the magnets are required to accept one-half of students within each reading group through a random lottery. Thus, each program generates three natural experiments, since students that randomly make it into the program (the treatment group) can be compared with their lottery-losing counterparts within their reading level (the control group). Additional details about the selection procedure are given in Crain, Heebner, Si, Jordan, and Kiefer (1992).

Thaler and Crain (1996) point out that applying to these programs is not costly, but, rather, "as easy as applying to remain in a neighborhood school," something all 8th graders must do, regardless of whether or not they wish to attend a career program. In 1988, 82% of 8th graders applied to magnet programs.

A lottery file database kept by the school board keeps track of which students were admitted by lottery or not, and additionally contains 7th-grade standardized reading and math scores, grades, date of birth, race, and gender. Thaler and Crain (1996) selected 49 programs for study, under the criteria that they accepted and rejected a minimum number of students from at least one of the reading levels, and that they agreed to participate in an interview study. In total, the programs enrolled 7,987 students, 61% female and 39% male; 5% Asian, 8% white, 27% Hispanic, 47% African American, and 12% of unspecified ethnicity. Thaler and Crain's report deals only with these programs' academic impact.

As in the JTPA study described below, this experiment is not as "clean" as may initially appear because of leakage from the treatment and control groups. For instance, some lottery winners did not attend the career magnet to which they randomly won admission. Some went to another school, and small numbers either disappeared, officially dropped out, transferred to private school, or attended a highly selective public school. Conversely, some lottery losers were nevertheless selected into their first choice program and attended anyway, while small numbers either disappeared, dropped out officially, transferred to private schools, or attended another selective public school.

After attempting to take into account this degradation of the experimental design, Thaler and Crain conclude that career magnet programs do not consistently outperform traditional comprehensive schools in academic outcomes. Students have similar reading and math scores, similar absenteeism, and take the "Regents" exam about as often. These results are less positive than the earlier results reported by Crain et al. (1992).

Crain et al. (1997) present several additional analyses of the New York City career magnet data. Some of these additional studies focused on a subsample of 110 high school graduates: 51 lottery winners who graduated from career magnet programs, and 59 lottery losers who attended and graduated from comprehensive high schools. The two samples were matched on their first choice of career magnet, on age, and on school performance in 7th and 8th graders. All respondents in this subsample took part in semistructured interviews lasting several hours.

A chapter by Zellman and Quigley in Crain et al. (1997) examined differences in dangerous or self-destructive behavior:

[They] found that career magnet students were significantly less likely to engage in a variety of behaviors that are associated with reduced school performance. Career magnet graduates were significantly less likely than comprehensive high school graduates to have ever been in a fight during or since high school, to have ever smoked, to drink alcohol at least weekly, ever used drugs, or ever become pregnant or made someone else pregnant. In sum, 41 percent of career magnet graduates reported no risk behaviors, while only 19 percent of comprehensive high school graduates fell into the "no reported risk behaviors" category. Indeed, the reduced incidence of these high-risk behaviors constituted the biggest differences between career magnet and comprehensive graduates. The substantially lower incidence of a wide range of at-risk behaviors represents the impact of the institutional setting on career magnet students. An academic core curriculum for all students, shared beliefs in the importance of work, and the legitimacy of workplace socialization led to the enforcement of many behaviors such as punctuality, appropriate attire, and personal responsibility that are incompatible with high-risk behavior. The teaching of career skill may have led as well to a sense that work and a career could be attained, beliefs that are incompatible with taking high risks. Better behavior and skills acquisition appeared to pay off. Career magnet graduates indicated a starting hourly wage that was one dollar higher than that for comprehensive students [$7.27 compared to $6.28]. Current hourly wage also varied in the same way for the 61 interviewees who were currently working [: $8.00 compared to $7.01.] (p. 36)

In another chapter of Crain et al. (1997), Stone and Bremer also analyzed the same interview data on the subsample of 110 graduates. They compared the lottery winners and losers on five self-reported measures of healthy youth development: (1) a feeling of competence in school, (2) competence at work, (3) sense of control over choice of career, (4) confidence about ultimately achieving career goals, and (5) general sense of happiness. Only one difference emerged as statistically significant: graduates from career magnets felt more competent in school (p. 83).

Allen's chapter in Crain et al. (1997) reports additional significant differences between the 51 graduates who had been lottery winners and the 59 who had not. The career magnet graduates cut class less often while in high school, had friends who were more likely to come from school instead of from the neighborhood, and more often said they would choose the same school they graduated from if they could do it all over again. Although four-fifths of both groups started college classes, the graduates from career magnets were more likely to have declared a college major, earned more credits, were more likely to perceive their parents as willing to sacrifice in order to send them to college, and were employed the same number of months after high school graduation even though they earned significantly more college credits (p. 108). In conjunction with additional life-history interviews given to 26 members of this subsample, Allen interprets these results to indicate that career-focused education can help a young person develop a positive and coherent identity.

Taken as a group, the analyses of the subsample data reported in Crain et al. (1997) suggest that the career magnet experience had some positive effects on personal and career development. Some of these findings are serendipitous: the explicit purpose of career magnets was not to reduce high-risk behavior, for example. Furthermore, it is difficult to find consistent effects in the full sample of lottery winners and losers because of degradation in the experimental design.

Strategies for Keeping Students in School: Evaluation of Dropout Prevention and Reentry Projects in Vocational Education: Hayward and Tallmadge (1995)

This evaluation uses a mixed experimental and observational design to assess the effectiveness of a three-year demonstration program sponsored by the Office of Vocational and Adult Education. Its main purpose was to test the effectiveness of different programs and strategies in reducing dropout rates among at-risk youth. Ten grantees in 16 locations received grants to (1) replicate project models found to be effective in other settings, (2) expand an existing project that met the objectives of the demonstration, or (3) develop new designs to meet locally identified needs.

The study is of interest here because most of the strategies included vocational education as a key component. Table 16 describes student-focused objectives pursued by grantees at 12 sites.

Table 16
Projects' Objectives for Participants


Student-Focused Objectives
Grantee
(1)
(3 sites)
(2)
(1 site)
(3)
(4 sites)
(4)
(1 site)
(5)
(1 site)
(6)
(2 sites)
Improved Graduation/GED Rate


x x x x
More Credits Toward Graduation/Higher GPA x

x

Improved Retention Rate for At-Risk Learners x x x x x x
Improved Academic Skills x x x

x
Improved Attendance x

x

Improved Self-Esteem


x
x
Improved Life Adjustment Skills

x


Reduced Suspensions/Disciplinary Actions x




Improved Employability x
x x x x
Improved Vocational Skills
x
x x x
Knowledge of Nontraditional Occupations


x

Assured Post-School Employment



x
Improved Employer Satisfaction




x

Table 17 presents the vocational components planned and actually implemented in different sites. Despite the fact that the characterization of the different programs' ingredients is not very detailed, it is suggestive of the emphasis on components-based evaluation suggested by Moffitt (1996) and discussed above. As that discussion recommends, the authors carried out the impact analysis by site.

Table 17
Vocational Components Planned and Implemented by the Projects


Grantee


Vocational Component

Implementation
Status

Woodside High School, Woodside
Business technology
Internships, work experience
Yes
Yes
Carlmont High School,
Carlmont
Business technology
Internships, work experience
Yes
Partially
Central Area Vo-Tech, Cushing
Supplementary vocational instructional materials
Yes
Computer lab with vocational software
Yes
Breithaupt Vo-Tech, Detroit
Instructional support in vocational classes
Yes
Tutoring support for ESL students
Yes
McFatter Vo-Tech, Broward
Vocational tutoring
Yes
Academic/vocational curriculum
No
Vo-Tech South, Anne Arundel
Vocational English
Yes
Instructional support in vocational classes
Yes
Community placements
Partially
OASIS Alternative, Oconee
Entrepreneurial business
Yes
Occupational programs
No
Grant High School, Portland
Employability
Yes
Career counseling
Yes
Vocational mentors in health careers
Yes
Turtle Mountain
Occupational programs
No
Work experience
No
Career development and employability
Yes
Fort Totten
Occupational programs
No
Career development and employability
Yes
Fort Berthold
Career development and employability
Partially
Fort Yates
Career development and employability
Yes
Work experience
Partially

The comparison methodology involved random assignment of approximately 27% of the participating students to treatment groups which received dropout prevention/reentry services from one of the sites. At the same time, 32% of the participants were assigned to control or statistical comparison groups. Control groups were generated by random assignment and comparison groups were nonrandomly constructed by matching students who were in the program with similar students who were not. Table 18 presents sites and sample sizes for the treatment and comparison/control groups by cohort.

Table 18
Dropout Prevention and Reentry Projects in Vocational Education:
Sample Sizes by Site and Cohort

Project Site Cohort 1 Cohort 2 Combined Cohorts
Treatment Group Control Group Gap Reduction Treatment Group Control Group Gap Reduction Treatment Group Control Group Gap Reduction
Woodside 40 45 42 35 40 39 75 85 83
Carlmont 41 48 43 44 47 50 85 95 96
Cushing 47 39 41 47 48 47
87 96
Detroit - - - 87 96 94 - - -
Broward - - - 24 29 29 - - -
Anne Arundel 19 23 42 18 18 45 37 41 93
Oconee 25 26 44 10 8 - 35 34 41
Portland 23 25 23 21 29 37 44 54 65
Turtle Mountain 15 19 30 10 13 17 25 32 57
Fort Totten 23 22 34 14 43 31 37 65 89
Fort Berthold 10 16 51 14 14 29 24 30 95
Fort Yates 16 16 - 16 62 51 32 78 -

Additionally, 41% of students were non-randomly assigned to a supplementary control group called a "gap reduction" group. The composition of this third group was meant to reflect characteristics of typical nonparticipating students, which includes youths not at risk. This enables estimation of whether the program closes some of the gap between participating students and this reference, deliberately different group.

The authors compare treatment and control groups' dropout rates through a Mantel-Haenzel test--an extension of the chi-squared test.[39] Additionally, the effects on other outcome variables were estimated, with the effects of the program modeled linearly. Table 19 presents the outcomes evaluated, grouped into three categories: (1) school performance, (2) school affiliation, and (3) student perceptions.

Table 19
Types of Participant Outcomes Included in the Evaluation

School Performance School Affiliation Student Perceptions
  • Higher grade point average
  • More credits earned
  • Fewer courses failed
  • Fewer absences
  • Fewer dropouts
  • School thought safer
  • Teaching/teachers better
  • Better job preparation
  • Counseling/counselors better
  • More academic encouragement
  • Classmates should not misbehave
  • Better future expectations
  • Classmates are college bound

School performance variables lend themselves to quantification, and the others were explored mainly through interviews. Out of the 12 sites considered, the following table summarizes the number of sites for which there was a statistically significant difference in the outcome between the treatment and the comparison or control group.

Table 20
Number of Sites with Statistically Significant Differences by Outcome


Outcome

Number of Projects
with Outcome

Reduction in dropout rate
4
Increase in GPA
10
Reduction in the number of courses failed
7
Increase in the number of credits earned
5
Reduction in the number of absences
5
Improvement in students' perceptions of teachers and instruction
4
Improvement in students' perceptions of counselors and counseling
2
Increase in students' perceptions that school is safe
7
Increase in students' perceptions of academic encouragement
4
Improvement of students' perceptions of job preparation
3

Only a third of the projects achieved significant reductions of the dropout rate, but 10 of the 12 sites had an increase in participants' GPA. The results are generally better in the areas related to school affiliation. Nevertheless, it appears these programs have not been very effective at achieving their main aim of reducing dropout rates.

It is of course important to mention that many of these modest outcomes may reflect that the desired results take longer to achieve than the three years of the demonstration. Nonetheless, despite the overall disappointing results, some projects' characteristics did lead to successes. While it is not the purpose of this section to expand on these, the authors tentatively identify some of them:

Job Training Partnership Act: Long-Term Earnings and Employment Outcomes: U.S. General Accounting Office (1996)

Recently, the Job Training Partnership Act (JTPA), the largest federal employment training program, underwent a major, randomized evaluation. This study is relevant here because JTPA training is provided by various types of institutions, including vocational-technical high schools and community colleges. Also, JTPA is specifically directed towards economically disadvantaged adults and youths, and the latter are one of the target populations of most STW programs.

The evaluation was designed to measure JTPA's achievement of two central goals: (1) raising participants' long-term earnings, and (2) lowering their long-term unemployment rate.[40] Such long-term assessment in general has significant data requirements, and in this case the study had access to longitudinal data on individuals included in the National JTPA Study, supplemented by annual earnings records from the Social Security Administration.

The study randomly assigned applicants for JTPA services either to enroll in the program's training, or else to be part of the control group, whose members were denied access to JTPA programs for the subsequent 18 months. If the randomization was carried out successfully, then these two groups should be close to identical in average characteristics, and should be well-suited for comparison. Note, however, that they may not be useful for comparison with the rest of the population, since only people who sought JTPA training in the first place were included, and this population may be systematically different from those who did not apply.

An additional problem with the comparison procedure used is that not all the members of the treatment group actually completed the treatment, and some did not even begin, but they were nonetheless included in the treatment group. At the same time, members of the control group were able to secure training through other, non-JTPA programs. These leakages from the treatment and control groups imply that simple comparisons of results for the original two groups may underestimate the true effects of the JTPA program.

The study reports wage and employment effects on adults and youths; here we report results only for the latter group. Table 21 presents the annual earnings for treatment and control groups of young males and females for the three years prior to JTPA assignment and each of the five years thereafter. For each sex, a column also indicates whether these earnings display a statistically significant difference at the 5% level.

Table 21
Earnings for Male and Female Youths Before and After Assignment

Time Period Males
Females
Annual Earnings Statistically
Significant
Difference?
Annual Earnings Statistically
Significant
Difference?
Treatment
Group
Control
Group
Treatment
Group
Control
Group
3 Years Before
$860
$828
no
$629
$663
no
2 Years Before
$1,456
$1,575
no
$1,069
$1,090
no
1 Year Before
$2,179
$2,303
no
$1,529
$1,707
no
Assignment
$2,894
$3,014
no
$1,974
$2,098
no
1 Year After
$4,612
$4,792
no
$3,339
$3,389
no
2 Years After
$5,620
$5,963
no
$4,045
$4,125
no
3 Years After
$6,130
$6,497
no
$4,393
$4,383
no
4 Years After
$6,687
$6,425
no
$4,934
$4,610
no
5 Years After
$7,554
$6,778
no
$5,433
$5,209
no

The table shows that earnings for the treatment and control groups of both sexes increased, presumably due to increasing age and experience. JTPA's impact, therefore, must be judged on whether participants' incomes increased any faster than those for individuals in the control group. Apparently, the JTPA program had no significant effects on earnings during the five years following assignment. In the years immediately following assignment, youth who received no training actually have higher earnings than those who did. The results on employment rates are presented in Table 22. Once again, employment rates for participants are not significantly different from those of control group members.[41]

Table 22
Employment Rates for Male and Female Youths
Before and After Assignment

Time Period Males Females
Treatment
Group
Control
Group
Statistically
Significant
Difference?
Treatment
Group
Control
Group
Statistically
Significant
Difference?
3 Years Before
46.5
48.3
no
41.2
43.6
no
2 Years Before
63.2
66.4
no
57.6
60.5
no
1 Year Before
79.6
79.6
no
70.7
72.8
no
Assignment
89.2
91.8
no
82.0
81.8
no
1 Year After
90.5
92.1
no
82.0
79.6
no
2 Years After
88.4
87.8
no
79.0
78.2
no
3 Years After
82.2
82.6
no
73.8
75.1
no
4 Years After
80.4
79.4
no
71.7
70.7
no
5 Years After
81.1
77.5
no
73.9
73.0
no

While these results suggest an almost uniformly negative assessment of this training program's effects, they have drawn a number of criticisms, including some by the U.S. Department of Labor (1996). Though evaluating these rival claims is beyond the scope of this paper, some of the observations are relevant in the present context. The Department of Labor (DOL), for instance, observes

New Evidence on Workplace Education: Krueger and Rouse (1994)

This study is a cost-benefit analysis of an employer-based education program's effects on several employment outcomes for individual employees. The training arose from a partnership of a New Jersey community college and two local businesses--one in services, and one in manufacturing--which were interested in training their entry-level workers. Courses were held at the worksite, and focused on either high-school level academic skills or more company-specific occupational knowledge such as the ability to read blueprints in the manufacturing sector.[42]

All direct costs were covered by a federal grant, but the companies had important indirect costs (in particular, because workers were paid while attending class). Indirect costs were calculated at approximately $300,000, and the total cost of the program was $750,000. This amounts to $940 per student, or $36 per student class hour, a cost "equivalent to the cost per trainee for programs sponsored by the Job Partnership Training Act." The $940 per student is equivalent to approximately 4% of the average trainee's annual compensation.

Table 23 presents estimates for a statistical regression equation to predict the increase in an individual's hourly wage between 1992 and 1994.

Table 23
Coefficients (and Standard Errors) from Regression Predicting
Individual Hourly Wages in 1994 Relative to Wages in 1992


(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
Participant
0.004
0.004
0.003
0.002
0.006
0.005
0.005
0.004
(0.002)
(0.002)
(0.002)
(0.002)
(0.002)
(0.002)
(0.002)
(0.002)
Age

-0.002
0.013
0.016

-0.002
0.014
0.016

(0.001)
(0.007)
(0.007)

(0.001)
(0.007)
(0.007)
Age Squared


-0.019
-0.022


-0.019
-0.022


(0.009)
(0.009)


(0.008)
(0.009)
Tenure (Yrs.)

-0.001
-0.01
-0.007

-0.001
-0.01
-0.007

(0.001)
(0.005)
(0.005)

(0.001)
(0.005)
(0.005)
Tenure Squared


0.028
0.023


0.028
-0.023


(0.017)
(0.017)


(0.017)
(0.017)
Female


0.005
0.003


0.005
0.003


(0.002)
(0.002)


(0.002)
(0.002)
Nonwhite


0.002
0.000


0.002
-0.0001


(0.002)
(0.002)


(0.002)
(0.002)
Ever Married


-0.003
-0.002


-0.002
-0.001


(0.002)
(0.002)


(0.002)
(0.002)
Education


-0.001
-0.0002


-0.001
-0.0003


(0.001)
(0.001)


(0.001)
(0.001)
1st Shift


0.004
0.003


0.004
0.003


(0.002)
(0.002)


(0.002)
(0.002)
No. of Job Bids


-0.015
-0.016


-0.015
-0.016


(0.010)
(0.010)


(0.010)
(0.010)
No. of Job Upgrades


0.018
0.010


0.014
0.007


(0.036)
(0.036)


(0.036)
(0.036)
Log Wage in 1991



-0.061



-0.060



(0.015)



(0.015)
Constant
0.017
0.027
0.009
0.151
0.018
0.027
0.008
0.150
(0.001)
(0.004)
(0.016)
(0.038)
(0.001)
(0.004)
(0.016)
(0.038)
R Squared
0.011
0.026
0.07
0.101
0.012
0.026
0.073
0.105

Here we concentrate only on the coefficient for the "participant" variable, which measures the difference in wage growth associated with training. The first four columns show the coefficient is not consistently significant when participation in any class is considered. However, the last four columns do show a robust and significant effect for occupational classes. Taking an occupation education class is estimated to increase earnings growth by between 0.6 and 0.4%. Importantly, the authors mention that "the findings for the occupational education classes are consistent with the importance that company officials attached to specific occupational skills, such as the ability to read a blueprint" (p. 16). This is an example of a situation in which a components-based analysis suggests conclusions significantly different from those that an aggregate, program-level analysis (of all program classes together) would produce.

As mentioned above, the total cost of training (direct expenses and release time) was approximately 4% of the average trainee's annual compensation. Assuming a completed job tenure of 20 years and a 3% real discount rate, the training program would need to generate a 0.275% annual wage gain to cover its (present value) costs. For the manufacturing company, the estimates are between 0.4 and 0.6%; whereas, for the service company, they are not significantly different from 0.

Assuming that the association between training and wage growth indicates an effect on actual productivity, expanding these programs seems desirable, even when only private benefits are considered. There could of course be additional benefits that accrue to taxpayers, employers, or society which are more broadly defined. Furthermore, the results are suggestive of the importance of market-driven employer inputs for program design. In this case, the positive and significant results were concentrated in the area the company managers most stressed--within a program that arose partially from their own initiative.

Conclusion

Readers seeking clear tests of whether STW "works" will have to settle for more partial answers. We explained in Part I that the STW movement springs from multiple sources, espousing different purposes and supporting various practices. Evaluating the movement as a whole is therefore impossible. Even evaluating the effects on students of the School-to-Work Opportunities Act alone is virtually impossible because STWOA itself promotes a wide variety of practices (hence, the plural "Opportunities"), and localities receiving STWOA funds generally use them to enhance efforts that were already underway (Hershey et al., 1997). Furthermore, since the basic purpose of the STW movement is to prepare young people to better participate in a learning-intensive economy and society, its true effectiveness can be determined only by observing the performance of individuals, firms, and the economy as a whole over the next few decades.

Even on a small scale, most recent studies of individual programs have not yielded clear results. An evaluation of new youth apprenticeships in Wisconsin found some positive results on both performance in school and employment after graduation, but the comparison groups were not randomly assigned. Non-random assignment leaves open the possibility that results are due to pre-existing differences among the students, not to the program itself. Conversely, the apparent lack of positive effects on grades and attendance for students in the ProTech program, where assignment also was not done at random, could be due to adverse selection of program participants.

A few recent studies did use experimental, random-assignment procedures, but these also give unclear results. The JTPA study found no significant effects on employment or earnings of young men or women. The Thaler and Crain (1996) study found no consistent gain in math scores for students in New York City career magnet schools. In both of these studies, however, a substantial number of individuals designated for the treatment group never actually received the treatment, and in the New York study a number of students who were not supposed to go to career magnets actually did. Leakage from the treatment and control groups vitiates the experimental design in practice, despite its advantages in theory. Furthermore, interviews with a subsample of the New York City students by Crain and associates (1997) found that graduates of career magnets showed more positive personal and career development than graduates of comprehensive high schools.

One study that produced relatively clear results was the evaluation of dropout prevention programs in various schools. Although some sites used random assignment and others did not, 10 of the 12 locations found that the dropout prevention efforts increased grades for the treatment group, and seven found a reduction in courses failed. The two sites reporting significantly positive program effects on the largest number of school performance measures were California career academies--both of which used random assignment in the evaluation. This result is consistent with previous, nonexperimental evaluations of California career academies (Stern et al., 1992). An experimental evaluation of 10 career academies around the country is currently being conducted by the Manpower Demonstration Research Corporation; this evaluation should yield more definitive results (Kemple, 1997; Kemple & Rock, 1996).

Although it is desirable in theory and sometimes feasible in practice, random-assignment evaluation is probably not the best approach to use for learning about the results of STW in general. STW activities vary greatly from one place to another. Furthermore, as the STW strategy increasingly encompasses all students within a school, the only way to conduct an experimental evaluation would be to assign students randomly to schools, which is usually not feasible. In addition, it could also be argued that the effect of STW will be to create a broader range of distinct options for students--career pathways or academies, for example--and if systematic self-selection of students into different options produces benefits for them, so much the better.

Still, educational innovation should be guided by some kind of systematic evaluation. One strategy that seems both useful and valid would use the school, college, or community as the unit of analysis. For instance, considering the school as the decisionmaking unit, students' performance while in school, and subsequent success in the labor market or further education, can be measured for all students at a given grade level, or for a representative sample of them, each year. Changes over time would then indicate whether the school was moving in the right direction. If some measures of student learning, as well as subsequent performance after graduation, are better for the graduating class of 2002 than for the class of 2000, that would be an indication that the school was doing something better. In these comparisons, any changes in the composition of the student body, or in economic conditions confronting graduates, would have to be taken into account. The accumulation of information from a set of schools obtained by comparing measures for all students over time could be useful for informing state and federal policy. Equally or more important, such data could guide decisions by each school community itself.


[35] In reality, the construction of comparison groups was significantly more complicated than can be discussed in this context. Only the author's summary comparison groups are included here, and interested readers should refer to the report.

[36] For the class of 1992, the pre-MTP averages were constructed by averaging the averages of the groups that are part of the comparison group. These two entries are not presented by Hollenbeck, but the conclusions they suggest are consistent with the ones he reaches.

[37] The authors do not specify whether these outcomes are with respect to some control or comparison group.

[38] This is a work in progress. The papers referred to in this subsection (Crain et al., 1997; Thaler & Crain, 1996) are drafts, and some results may be subject to change.

[39] The authors explain, "within each demographic category of student, a two by two table (treatment control by dropout-nondropout) was constructed, and these tables were combined across demographic categories at a site (i.e., gender, race/ethnicity, and relative age) to estimate the aggregate difference between the observed and expected number of treatment group students who dropped out. The relative rate of dropping out and the relative odds of dropping out were computed as summary statistics, the probability of obtaining as large or larger difference by chance was computed. Both the one-tailed probability, on either tail, and the probability of obtaining an outcome with a smaller probability were computed."

[40] For further information on the JTPA, see Bloom et al. (1993). The study reported on here is an extension on these previous two. The authors mentioned that the results generally matched those obtained by Orr et al. for the 30 months immediately following assignment.

[41] Results for adults which are not reviewed here were more positive in the sense that participants had significantly better outcomes on at least some years. Despite this fact, the authors' conclusions are somewhat pessimistic on the overall results for that age group, too.

[42] Human capital theory suggests firms would be unwilling to finance the "non-firm-specific" training component. Using interview evidence, the authors suggest managers did not worry about this because the decline of manufacturing in the area meant these skills were less transferable and, therefore, firms faced reduced risks of losing employees to competitors or other sectors. This assessment was at least partially confirmed in that employee turnover did not increase after the program ended.


<< >> Up Title Contents NCRVE Home
NCRVE Home | Site Search | Product Search