NCRVE Home | Site Search | Product Search

<< >> Title Contents NCRVE Home

DATA AND METHODS:
THE SURVEY OF INCOME AND PROGRAM PARTICIPATION

In the search for data to examine the effects of sub-baccalaureate education, there are at least three different desiderata. First and most obviously, a data set must have more detailed information than simply years of schooling completed; it is desirable to have information about sub-baccalaureate credentials as well as about uncompleted postsecondary education from different types of institutions. Second, it is necessary to have information about earnings some years after students have completed school so that individuals have settled into relatively permanent employment patterns. Because earnings differences among groups with different levels of schooling typically do not emerge until the early thirties, surveys taken of individuals shortly after they have left school--for example, the High School and Beyond survey of the high school class of 1980, which contained information obtained six years after high school[5]--may not be able to identify such earnings differences. Furthermore, as the results below will clarify, surveys which include many individuals whose schooling is incomplete are likely to be misleading because such individuals are likely to have part-time employment to get them through school--what some counselors call "stay in school" jobs--which distorts earnings patterns. Third, a data set should have sufficient information about an individual's characteristics and abilities aside from their schooling records in order to control for the various other potential influences on earnings.

Not surprisingly, no data set is perfect. One strong advantage of the SIPP is that it includes individuals of all ages, rather than including just one cohort (like the NLS72, or High School and Beyond) or a few relatively young cohorts (like the National Longitudinal Survey of Youth). The SIPP has better information about schooling than most other surveys; however, school attainment is self-reported, with inevitable biases, and details are unavailable about the types of institutions these individuals attended. The range of independent variables is adequate, though some desirable information is missing, particularly on measures of ability and academic achievement. These limitations must be remembered in comparing the results from the SIPP in this monograph with other results.

The SIPP was designed principally to examine participation in public programs, particularly those such as Aid to Families with Dependent Children (AFDC) which are related to the welfare system. Because of this goal, and because the patterns of program participation over time are of central interest, the SIPP is structured in ways quite different from most data sets used to examine earnings. For every year starting in 1984 and beyond, a new panel of the SIPP is developed. Each panel includes households who are interviewed every four months; for the 1984 panel, for example, eight interviews covered thirty-two months of these households' experiences. Households in each panel are also divided into four "rotation groups," with only one rotation group given interviews each month; as a result, information from each rotation group differs slightly in its timing. In these results, I have used information given by these individuals regarding earnings for each of the twelve months of calendar years 1984, 1987, and 1990, drawing on three or four different interviews. I then constructed a variable for annual earnings in order to make my results consistent with other analyses using annual earnings. The earnings measure used includes all wage and salary earnings, as well as self-reported earnings from self-employment, but none of the various sources of transfer income included in the SIPP.[6]

In addition to questions asked monthly of rotation groups, the SIPP periodically asks individuals to complete more specific questionnaires, called "topical modules." One of these covers education and training and is the most important for my purposes. Individuals are asked if they have completed various levels of schooling, including a Ph.D., professional degree, baccalaureate degree, Associate degree, vocational certificate, high school diploma, or less than four years of high school. For those enrolling in postsecondary education but not completing credentials, individuals are asked how many years of postsecondary education they completed, ranging from less than one year up to four years. Unfortunately, the type of institution in which individuals received this postsecondary education has not been recorded. It is plausible to infer that those receiving three or four years of postsecondary education attended four-year colleges (perhaps with some attendance at two-year institutions as well), while those with lesser amounts (particularly one year or less than one year) attended community colleges and technical institutes.[7] However, strictly speaking, the information is not available in the SIPP to make this inference. In addition, when individuals report completing "two years" of postsecondary education, for example, it is unclear whether they have attended full time or part time or whether attendance resulted in course credits or represented desultory attendance without completing coursework.

It is crucial to keep in mind--particularly when comparing these results to those available from NLS72 data, which includes transcripts describing postsecondary education--that the measures of educational attainment (and training as well) are self-reported. In the NLS72 data, which includes both self-reported and transcript-reported measures of whether an individual enrolled in postsecondary education, self-reported postsecondary education is invariably higher than that reported on transcripts. Furthermore, the difference is greater for groups such as those with lower grades in high school or those with lower socioeconomic status, who would normally be the least likely to enroll in postsecondary education (Grubb, 1992a). Therefore, self-reported education is likely to be exaggerated even more for those with low levels of education. This pattern, in turn, implies that the estimated return to schooling will be higher for the SIPP data than for results based on transcripts like those from the NLS72 data.[8]

Another topical module, available for 1987 but not 1984 or 1990, describes the family of origin for individuals surveyed in the SIPP. This provides a source of information for variables measuring family background, which has generally been found to influence educational attainment and, at least in some results, subsequent earnings as well. In these results, family background was initially measured by both the education level and the occupation of the head of household when the individual was fifteen years old, plus a dummy variable describing whether the head of household was female. However, because parental occupation was never significant, it has been omitted from the results reported here.[9]

Other variables conventionally included in earnings equations are available from the base month questionnaire given to all individuals. These variables include race and ethnicity; whether an individual's job is covered by a union contract; a series of regional variables as well as one describing location in a metropolitan area, to reflect regional differences in salaries and costs of living; and variables describing marital status and disability. In some cases, missing data forced some values to be imputed, and dummy variables were included in such cases.[10]

It is important to note, however, that there is no information in the SIPP on ability or achievement at any level of education, an omission which has the effect of positively biasing the estimated return to schooling since ability and achievement are always positively correlated with educational attainment. Unfortunately, the magnitude of this bias is unclear, since, in other estimates, its importance has ranged from uninfluential to substantial, depending, in part, on the measures of ability available (Leslie & Brinkman, 1988). Therefore, the estimates presented in the next section must be interpreted with care because they may be partly explained by the influence of ability or achievement, rather than by schooling alone.

The most difficult variables to construct from the SIPP data are measures of labor market experience. Individuals interviewed were asked about the starting date of their current job, from which tenure on the current job can be calculated. Interviewees were also asked how long they had been doing work similar to the current job, from which related experience on other jobs can be calculated. Finally, information is available to calculate total experience, as well as previous experience unrelated to the current job as total experience, minus current job tenure, minus related experience on prior jobs. If labor market experience is considered to measure relatively job-specific on-the-job training, an obvious hypothesis is that the effects of tenure on the current job on earnings will be the highest, followed by related prior experience, with unrelated prior experience the least influential. In all cases, quadratic terms are also included, to allow for nonlinear effects of experience. However, because not all questions are asked of all individuals, it is possible that some periods of experience are missed in the SIPP data; therefore, age (and age squared) have been included to cover any potential gaps. Finally, for some groups it is necessary to impute values for various kinds of experience, and dummy variables have been included whenever such imputations are made.[11]

While the SIPP intends to be a nationally representative sample, the sampling method is complex, and the data includes a weight designed to compensate for nonrandom sampling. This weight will influence any statistical results only if the earnings patterns of those oversampled are different from those of other individuals. In early trials, weighted and unweighted results were almost precisely the same; I report unweighted results since they are simpler and do not presume heterogeneity of regression equations.

The results presented below, therefore, correspond to standard earnings equations, using the conventional semi-log form with the log of earnings as a linear function of variables which includes binary variables for formal schooling, job training, various measures of experience and experience squared, and other independent variables described above. With this functional form, the coefficients can be interpreted as reflecting the percent increase in earnings associated with any particular level of education, at least if the coefficients are relatively small. These coefficients can be readily compared to those using other data sets. Education is measured relative to those with a high school diploma only, so that the coefficients describe the advantage as compared to a high school graduate of postsecondary education or, conversely, the disadvantage of completing less than a high school diploma. All results are estimated separately for men and women, since both theoretical considerations and prior results indicate that schooling has different labor market effects for men and women. As is conventional, only individuals with positive earnings for a year are included. The coefficients presented in the following sections are estimated for individuals age 25-64, including all individuals still in school. Because the sample of individuals affects certain results--particularly the effects of completing some college coursework without a credential--Appendix A presents coefficients using different samples. Finally, Appendix A also presents the coefficients of all independent variables included in these earnings equations, apart from those of the education and training variables presented in the text.

[1] For some earlier work on returns to community colleges, using institution-specific data, see Pincus (1980) and Heinemann and Sussna (1977). Wilms (1974) examined a data set that he collected; Blair, Finn, and Stevenson (1981) used a National Science Foundation data set confined to scientific and technical personnel; and Breneman and Nelson (1981) used the fourth follow-up of the NLS72 data, at a point seven years after high school graduation, when it is too early to detect the effects of sub-baccalaureate education. For more recent work using the NLS72 data, see Grubb (1992a, 1993a, 1995a, 1995b), Kane and Rouse (1993) and Hollenbeck (1993); for results with the NLS-Youth data, see Kane and Rouse (1993); for results with the High School and Beyond data, see Lewis, Hearn, and Zilbert (1993).

[5] A new compilation of data collected in 1992 will provide information twelve years after leaving high school, providing better estimates of the effects of schooling on "adult" employment.

[6] The reported earnings by the SIPP are monthly earnings, which are cumulated to form annual earnings. The reported monthly earnings are truncated at $8,333, corresponding to a maximum annual earnings of $100,000; for the relatively few individuals with truncated earnings, their earnings are estimated by fitting a Pareto distribution to the individuals with the same education and gender but whose earnings have not been truncated. This procedure, in turn, results in a very few individuals whose estimated earnings seem much too high, but trials including and excluding these individuals indicate that they are too few to affect the results in any significant way. In the reported results, all individuals with earnings estimated by the Pareto method have been retained.

[7] Those who attended a proprietary school and received a certificate or (rarely) an Associate degree have presumably reported these credentials. Those who attended a proprietary school without receiving credentials may have reported this either as "some college" or more probably as a form of job training in "vocational schooling." However, there is unavoidable ambiguity as to what "vocational schooling" refers, particularly since the questionnaire did not define this term for respondents.

[8] See Grubb (1992a). If the reported level of education is Ed*=Ed + u, where Ed is the actual level of education and u is measurement error with a positive mean and negative correlation with Ed, then the estimated equation is Y = a + b(ED + u) + ... + e, and the estimated coefficient b will be upwardly biased compared to the true b.

[9] The unavailability of data on family background for 1984 and 1990 makes these specifications different from those used in 1987. However, because the coefficients on the family background variables in 1987 were not consistently large and significant (see Table A-3), their exclusion does not change the education coefficients very much; for example, the effect of a baccalaureate degree for men increases from .394 to .425 when these background variables are omitted. By extension, then, the exclusion of family background variables in 1984 and 1990 makes little difference to the results or to their comparability across the three years.

[10] There are imputation dummies for union coverage and for family background at age fifteen. In addition, a few individuals are missing data on the highest degree attained, on the amount of job training received, and on the nature of the training sponsor. These also have dummy variables. See Appendix Table A-3 for the coefficients of the imputation dummies.

[11] In these results, imputation-related dummy variables are included for the following cases: (1) when a second job becomes a primary job and job experience for the secondary job must be imputed; (2) when total experience was imputed as age minus education minus 6 for those whose previous job ended before 1976, who were not asked experience prior to their current job; (3) when the SIPP imputed a job starting date, and thus current job tenure; (4) when the SIPP imputed total related experience; and (5) when the SIPP imputed total experience for those failing to answer about previous experience. Note that three of these five imputation dummies were created by the SIPP itself. For the coefficients of these dummies for 1987, see Table A-3. The amount of imputed data necessary varied from year to year, as do the coefficients on the imputation dummies. Fortunately, these dummies do not greatly affect the parameters of interest, as can be seen from comparing specifications with and without the imputation dummies.


<< >> Title Contents NCRVE Home
NCRVE Home | Site Search | Product Search