Beginning in the 1980s, outcome evaluations began to use true experimental methods for evaluation: individuals are recruited for programs, and a random sample (the "experimentals") are allowed to enroll in the program while the others (the "controls") are administered questionnaires to collect roughly the same information as the experimentals about the services they receive and their employment history. The ethical dilemmas involved in experimental methods have been avoided by using volunteers and by recruiting more individuals than can be accommodated in the program to be evaluated, so that one could argue that some individuals would not be served even if the programs were not being evaluated using an experimental design. In addition, the effectiveness of these programs is genuinely unknown so that -- unlike denying an individual access to a vaccine known to work against a particular disease -- no one is being kept out of a program that would surely increase their life chances.
The great advantage of experimental methods is that they can eliminate the possibility that various factors unconnected to program effectiveness are responsible for any findings. In job training programs, there are three such factors that are particularly dangerous: selection effects, maturation effects, and regression to the mean. Selection effects operate because job training programs by construction select those individuals who have certain barriers to employment -- low education levels, little work history, perhaps motivational problems, or histories of drug and alcohol abuse -- and therefore might be expected to benefit least from any training program; the variety of these characteristics is so great, and so unmeasurable, that it is difficult to create an equivalent control group without experimental methods. However, these negative selection effects are complicated by other selection effects created by the administration of programs. In order to look good, job training programs have an incentive to choose the most able and job-ready of the individuals who are eligible -- a process known as "creaming". This creates a positive selection effect in addition to the negative selection effect involved in eligibility for the program. Moreover, this kind of effect may operate differently over the business cycle: when unemployment falls, the most job-ready individuals are able to find jobs so that programs have to work harder to recruit people to enroll -- and may have to enroll the least job-ready individuals with multiple employment problems. Paradoxically, then, in boom times when low unemployment makes placements somewhat easier, the individuals enrolled are the least job-ready; when unemployment is high and placements more difficult, the most job-ready individuals are likely to be enrolled because of "creaming". It would be virtually impossible, then, to construct a control group that is comparable to those enrolled in a job training programs except under experimental conditions, since there are too many administrative, economic, and personal factors that affect the composition of a job training program.
Maturation effects occur when individuals improve their conditions by aging or maturing. This is a particularly likely result for youth, who suffer much higher rates of unemployment and lower earnings when they are young and then gradually mature into the more stable employment and earnings patterns of adults -- most of them without the help of any particular program. Maturation effects are also likely for measures of academic achievement, for knowledge about the labor market, for risk-taking behavior, and for certain measures of disruptive behavior including drug use and criminal activity. Without considering this phenomenon, youth programs may look effective over time as those who have enrolled in them mature, even though the program may have had no effect on this process.
Regression to the mean is another problem. By construction, job training programs enroll individuals who have had problems in employment. But some of these individuals may have had an unlucky spell -- an unexpected layoff, for example, for an individual with adequate job skills in what is otherwise a healthy local economy -- and can be expected to find employment on their own within a few months (that is, they regress back to their normal conditions of employment after a while). For such individuals a job training program might speed up the return to employment, but may not make any difference to whether such an individual finds employment again -- in contrast to an individual who lacks fundamental job skills, who is unlikely to find employment without training. The problem of regression to the mean is a particularly serious problem in welfare programs: a large fraction of the welfare population is on welfare for a brief period -- following a layoff, the departure of a wage-earning family member like a husband, a medical emergency -- but then finds employment and leaves the welfare rolls after a short period of time. If large numbers of these "temporary" welfare recipients are enrolled in job training programs, then it will appear to be a success -- though all that may be happening is that normal turnover rather than the effectiveness of the job training program is causing some individuals to find employment and leave welfare. In the quasi-experimental evaluations of CETA programs in the late 1970s, the apparently greater increase in earnings for experimental groups compared to comparison groups turned out to be due to regression to the mean for males, though for females job training programs increased earnings by slightly more than would be expected from such a pattern (U.S.C.B.O. , 1982, and Figure 1).
However, with experimental methods every kind of selection effect is eliminated, as are maturation effects and regression to the mean[17] -- so that any differences in employment after a job training program can be attributed to the program rather than to other causes. Despite these advantages of experimental methods, however, there remain a number of disadvantages, or evaluation problems which the use of experimental methods has not always been able to resolve:
Finally, the problems in evaluating effectiveness for different services and sub-groups extend to the evaluation of particular programs as well. That is, a national job training programs like JTPA is in reality an agglomeration of over 500 programs, each administered locally -- and any average effect masks the distribution around this average caused by the existence of highly effective programs simultaneously with truly dreadful local efforts. The most effective programs may be the most valuable guides to improving practice, of course, so -- if they can be identified -- their characteristics may provide the best information about how to improve programs. While the early evaluations did not address the effectiveness of individual programs, some of the most recent evaluations have managed to detect individual programs that are more effective than the average (reviewed in Section III.7 below).
The danger of displacement also reflects a difference between human capital models of earnings and employment -- in which education and training instill new competencies that increase the productivity and then the wages and earnings of individuals -- from screening and signaling models, in which education or job training signals the greater competencies of certain individuals over others but without changing those competencies. If screening prevails, then individuals completing job training programs will have higher levels of employment and earnings -- but their employment will come at the expense of other individuals who fail to get these jobs, and employment and productivity in the aggregate will not increase. Job training programs generally assume a human capital model, and there is virtually no reference in the evaluation literature to the possibility that signaling might explain any positive outcomes.
The period of time is critical because of the question of whether any potential benefits increase over time or degrade. In the pattern typical of age-earnings profiles for different education levels, for example, a level of schooling may not generate any real increase in earnings for several years, during which an individual is searching for an appropriate job; then the benefits tend to increase over time, peaking somewhere during the period between age 45 and 55 before declining as retirements begin. Similarly, in job training programs one might expect a decrease in earnings during the program itself, as individuals are forced to leave any employment they might have; then, perhaps following a period of job search when earnings are still low, one would hope that earnings compared to those of the control group would be higher, and perhaps would continue increasing as the greater skills from the job training program enable individuals to advance in their jobs compared to the control group. However, a different possibility is that short-term job training programs push individuals into low-quality employment without improving their skills, so that there are short-term employment benefits that disappear after a short period -- leaving experimentals no better off than controls in the long run, and potentially even worse off because off the period of low earnings during the program itself. (This may be especially dangerous with job search assistance, which is designed to help individuals find jobs but without improving their skills.) The difference between these two possible patterns can be detected only with information about earnings several years after a program ends -- and unfortunately many evaluations have not lasted long enough to collect such information. The available results on effects over time are reviewed in Section III.6 below.
* The predominance of experimental approaches has overshadowed other
methods of understanding job training programs, particularly the use of
qualitative and ethnographic evaluations that might provide better insights
into why programs succeed or fail. The earlier evaluation literature of
CETA and the welfare experiments of the 1970s include some qualitative studies,
in which researchers would observe programs carefully, interview participants
at length, and otherwise try to determine what life in a program was like for
its participants. The purpose was not only to get a better sense of what
programs are like -- the "texture of daily life", or the "lived experience" of
programs, as ethnographers might say -- but also to develop better information
about how programs are implemented, what precisely goes on in them, and why
they might be ineffective. In recent examples of qualitative ethnographies and
case studies, for example, Hull (1994) has described the amount of teaching
about on-the-job relationships (in addition to technical skills) that occurs in
a banking program; Kalman and Losey (forthcoming) have analyzed how a workplace
literacy program fails to live up to its self-conception as an innovative,
worker-centered program; Gowen (1994) has described the turmoil in a workplace
literacy program; Grubb and Kalman (1994) have described how the dominant
teaching methods in work-related remedial programs undermine their
effectiveness; and investigations based on interviews have suggested that
certain behavioral problems make job-keeping (rather than job-finding) a
problem among the chronically unemployed (Quint, Musick, and Ladner, 1994).
This last study is particularly interesting because it examined the lives of 50
women enrolled in New Chance, which was also evaluated with random assignment
methods (see Section III.3 and Table 13). They found that those enrolled in the
program were enthusiastic about it; but their progress into employment was slow
and uneven partly because of the problems caused by living in highly
disorganized families and communities.
One rationale for qualitative studies, then, is that they can provide explanations for the outcomes determined by quantitative analyses. Many of the reasons I offer in Section IV for the small benefits of job training programs are based not on formal results from random-assignment experiments but on less formal case studies and observations of job training programs. Formal quantitative evaluations are necessary, then, because only these methods can demonstrate the effects on employment and earnings of job training programs; but qualitative studies are necessary too, to understand why some programs work and others don't and to clarify how existing programs might be improved. Unfortunately, these two traditions of research are not well-integrated: the qualitative examinations typically collect no information about effects on earnings and employment, and the quantitative analyses rarely carry out qualitative studies.
The final drawback of random-assignment evaluation, of course, is that it is expensive. It can therefore be applied to large-scale evaluations of national programs of considerable policy importance -- but it cannot be applied routinely, and cannot be applied to small programs, to many experimental efforts, or to local programs deciding what mix of services or which specific providers they should use. This means that job training programs have typically been subjected to two quite different kinds of "evaluation": random-assignment evaluations of great sophistication and cost, performed largely for federal policy-makers deciding how to establish federal guidelines and legislation; and locally-collected information about effectiveness, like the performance measures required by JTPA and information about caseloads collected in local welfare programs. This kind of local information, which is much cruder and susceptible to local manipulation, is used to monitor local programs, to impose sanctions on local programs that are out of compliance with performance requirements, and in some cases to make local decisions about effectiveness. In the only effort to calibrate these local evaluations with random-assignment evaluations -- to see, for example, whether local programs with strong results on performance measures also have strong results from random-assignment evaluations -- there proved to be no correlation between the two (Doolittle et al., 1993, p. 10). This suggests that performance measures are virtually useless for making rational decisions about effectiveness -- even though they provide political protection because they make JTPA seem like a performance-driven program.
There is little question that the quality of evaluations has increased substantially over the past twenty years. Job training programs -- and particularly those associated with the welfare system -- have been the subjects of what is probably the most sophisticated policy-oriented analysis in the United States. However, given the complexity of social programs and the variety of job training programs, in a country as large and diverse as the United States, it should not be surprising that these evaluations have failed to answer all the important questions about job training programs. Indeed, given the variety of programs and the variation among localities in how they are administered, it is amazing that the existing evaluations come to so much agreement about the effects of different programs -- the subject of the next section.
[16] CETA generated a data set -- the Continuous Longitudinal Manpower Survey (CMLS) -- that followed a several waves of CETA clients and also contained information about control groups, and CLMS was used for many of the evaluations; for a survey, see Barnow (1986).
[17] There can be, of course, differences between experimental and control groups due to sampling error. Therefore most evaluations use regression methods to control for the effects of various personal characteristics that may vary between experimental and control groups -- but with the knowledge that variables describing program participation are uncorrelated with background variables or unmeasured characteristics like motivation.
[18] A contrary argument could be made: that assignees who don't enroll find a job in the interim, and are therefore more job-ready and motivated. However, if this were true, then the benefit per assignee would be higher than the benefit per enrollee, contrary to the evidence.
[19] This is likely to happen if the demand for labor is relatively price-inelastic, in which case any shift in the supply function for labor will increase employment only slightly and reduce wages considerably.
