With growing concern about the number of children living in poverty in the United States, the Head Start program was initiated in 1965 as Project Head Start, one component in the “war on poverty.” Originally conceived as an eight-week summer pilot program implemented in nearly 2,500 communities across the country, the first trial served 500,000 four- and five-year-old children. From this initial beginning, Head Start programs quickly expanded to nine-month and even full-year programs, either half-day or full-day. As of 2006, the Head Start program served about 900,000 three- and four-year-old children nationally in nearly 19,000 centers for a total annual cost of $6.8 billion. Over time, the reach of the program has been substantial: Between 1965 and 2006, Head Start has enrolled more than twenty-four million children.

Challenges of Program Evaluation

By providing a comprehensive set of education, health, and social services, Head Start aimed to counteract the detrimental influences of poverty and prepare children academically and socially for school entry. From its inception, there has been an interest in understanding whether Head Start indeed achieves this objective. However, a comparison of outcomes for children who participated in Head Start with those who did not would not measure the true effect of Head Start if program participants are systematically different from nonparticipants. For example, Head Start participants are on average from poorer families and they may come from families whose parents are more motivated to see their children succeed. These selectivity factors could confound efforts to measure the true effects of Head Start participation.

Ideally, we want to measure the effects of Head Start on children’s outcomes compared to what those outcomes would have been for the same children in the absence of the program. Since it is not possible to observe the same children both participating and not participating in the program, researchers turn to experimental and quasi-experimental methods to assess program effects. Well-designed and implemented randomized control trials—with study participants randomly assigned to participate in the program or remain in the control group— remain the gold standard among social science evaluation methods. Potential selectivity bias is eliminated because program participation is determined at random. A second-best alternative is the use of quasi-experimental methods using data on participants and nonparticipants where statistical methods are employed to minimize any selectivity bias.

The history of Head Start evaluations reveals many of the challenges associated with conducting systematic scientific evaluations of large-scale social programs. First, while a randomized control trial is the gold standard, implementation of such experiments may not be compatible with the objectives of delivering social services. For example, such experiments require resources that could alternatively be used for program delivery, and it is often considered unethical to withhold program services for a control group. Second, while much of the initial focus of Head Start supporters was on the potential short-term gains in IQ, other evidence indicated that the benefits of Head Start may be both broader and longer-lasting. In the context of experimental evaluations, this requires careful measurement of multiple outcomes and the ability to conduct a longitudinal evaluation that follows treatment and control-group children years or even decades into the future. Finally, Head Start is not one uniform national program. Rather, there is tremendous variation across program sites in the nature and delivery of program services. This variation means that evaluations of selected Head Start programs may not represent the effects on average for the country as a whole.

Early Evaluation Efforts

Until the 1998 congressional reauthorization of Head Start, there was no nationally representative evaluation of the program using a randomized experimental design. Early evaluation efforts sometimes used experimental designs to study local area programs, or quasi-experimental designs to study larger, national samples of children who participated in Head Start and comparison children who did not. Results from this early body of research were potentially compromised by small sample sizes, high rates of attrition, nonrepresentative samples, and the selectivity of program participation. Such potential biases were also present in the seventy-six studies that were the focus of a meta-analysis of the literature known as the Head Start Evaluation, Synthesis, and Utilization Project (McKey et al. 1985).

Despite these limitations, the conclusion that emerged from this meta-analysis in the mid-1980s and the literature that preceded it was that Head Start could generate immediate cognitive benefits (e.g., higher IQ scores or achievement scores) for participating children, but those benefits did not persist after the first few years of elementary school as the gap between Head Start and nonHead Start children narrowed. There was some evidence that Head Start participants showed other improved outcomes over nonparticipants, such as lower rates of grade repetition and special education use, as well as physical health benefits. Similar findings had also been demonstrated for smaller-scale demonstration programs that offer one or two years of high-quality preschool education, although the magnitude of the effects for these smaller-scale and more resource-intensive programs were often larger than those measured for Head Start participants. More recent syntheses of this literature suggest that while the fade-out (relative to nonparticipants) of IQ gains from participation in Head Start or other high-quality early intervention programs may be real, the fade-out of achievement test scores may result from flaws in evaluation designs and follow-up procedures. This explanation can help reconcile the apparent achievement fade-out with the longer-lasting effects measured for such educational outcomes as grade repetition and special education use.

Later Evaluation Efforts

In the absence of large-scale experimental studies, researchers in the 1990s turned to nationally representative survey samples to estimate the short- and long-term benefits of participating in Head Start. Using rigorous quasi-experimental methods, these studies demonstrated that Head Start had favorable and more sustained effects on test scores and other school outcomes for white children, but the initial cognitive benefits for black children faded with time. For both white and black children, Head Start participation led to higher immunization rates but had no effect on nutritional status. Analysis of longerterm data also showed favorable effects of Head Start on high school completion, college attendance, and criminal activity, but again the benefits differed by race and ethnicity.

In the 1998 reauthorization of Head Start, Congress mandated a national study to measure the effects of Head Start on school readiness, broadly defined to include cognitive development, general knowledge, approaches to learning, social and emotional development, health status, and access to health care. The study also aims to evaluate the effects on parental practices that influence school readiness and the conditions under which the program is most effective and for which children. The evaluation underway randomly assigned about five thousand newly entering three- and four-year-old children applying for Head Start in nearly four hundred randomly selected Head Start centers to either participate in Head Start or be in a non-Head Start group (with access to non-Head Start programs in the community as selected by the parents). Findings from the first-year follow-up released in 2005 showed small to moderate gains for Head Start children in most of the domains listed above for entering three-yearolds, while fewer effects were found for the four-year-olds. The children in the study will continue to be followed through first grade, and possibly beyond, with an effort to identify overall program impacts as well as how the effects of Head Start vary with program characteristics or for different types of children.

Other demonstration projects implemented in the 1990s assessed other aspects of the Head Start program. One such evaluation—the National Head Start/Public School Early Childhood Transition Demonstration Study implemented from 1991 to 1998—was designed to test the effects of providing comprehensive transition supports (e.g., social services to strengthen families and links to school, services to increase parent involvement, and promotion of developmentally appropriate activities in the classroom curriculum) for Head Start children and their families, schools, and communities in kindergarten through third grade. The study design randomly assigned more than 450 schools in thirty-one sites to the treatment or control conditions, with more than eight thousand former Head Start children studied. While inferences that could be drawn about the effects of the transition supports were limited by the site-to-site variation in the treatment conditions and similar activities implemented in control schools, the study documented consistently large gains among former Head Start children (both treatment and control children) in reading and math achievement in these early grades, as well as substantial parent involvement in school and home activities.


