You are seeing this message because your web browser does not support basic web standards. Find out more about why this message is appearing and what you can do to make your experience on this site better.

The Harvard Family Research Project separated from the Harvard Graduate School of Education to become the Global Family Research Project as of January 1, 2017. It is no longer affiliated with Harvard University.

Terms of Use ▼

Picture of Howard Bloom
Howard Bloom

Howard Bloom, chief social scientist for the Manpower Demonstration Research Corporation (MDRC), leads the development of experimental and quasi-experimental methods for estimating program impacts at MDRC. He has conducted or helped to design evaluation studies that have randomized public housing projects, firms, schools, classrooms, and day care centers.

Stephen Raudenbush, professor at the University of Michigan School of Education and Survey Research Center, has spent the last 10 years focusing on building into statistical models the fact that children are clustered into social settings rather than left as solitary individuals.

Picture of Stephen Raudenbush
Stephen Raudenbush

In this installment of Questions & Answers, Howard Bloom and Steve Raudenbush discuss the possible use of group randomized trials to assess quality youth programming.1

Q: Why do you feel there is a need to develop new experimental approaches to assessing program quality and how might group randomized trials be appropriate for evaluating youth programs?

A: Steve Raudenbush: Using group randomized trials to assess program quality is actually not a new idea. One of the classic texts of the early ’50s by Lindquist2 on experimental design in psychology and education identified this design as useful, and there is a history of using this design in public health. The design has been a component in many studies that have been funded by the Centers for Disease Control. What is new is the emphasis on this approach in other areas, such as youth development.

Howard Bloom: I agree. When the theory of the intervention acknowledges the collectivity—the force of the group—as part of the mechanism by which an intervention is supposed to create its effect, then you clearly want to randomize groups. You don’t want to be randomizing individuals in and out of groups. Acknowledging the group nature of the intervention is very important for assessing youth and education programs. This approach may not be the most appropriate for evaluating all youth interventions, but it is becoming appropriate for many group-oriented interventions.

The growing emphasis on randomizing groups to assess the effectiveness of youth interventions reflects the increasing acceptance of randomized experiments in general. Historically, research in education and related fields has used non-experimental alternatives that produce less reliable results than experimental designs would yield. And even though the past 20 years have seen some very important methodological advances in non-experimental design using sophisticated modeling and matching procedures, these advances fall short of providing the solid causal evidence available from randomized experiments. When empirical research has been conducted to ask whether you can get the same kind of reliable and internally valid results from non-experimental comparison group methods, the answer has been no.

Steve Raudenbush: That point helps explain the recent emphasis on randomization, but the other part of the question is, why group randomization? As Howard pointed out, the group level is really, for many of the interventions we’re interested in, the natural unit of intervention and randomization. In a comprehensive school reform program aimed at restructuring instruction throughout the school, for example, the entire school would naturally be the unit of treatment, and therefore of randomization.

One appealing aspect of group randomization is that it eliminates some of the difficulties in individual-level randomization. For example, when teachers are randomly assigned to implement different treatments, they might share information across groups, contaminating data. Or, there might be some kind of tension between theexperimental and control groups within the school. When the whole school is randomized, you avoid these problems.

Related Resources

Bloom, H. S. (2003). Sample design for an evaluation of the Reading First program. New York: Manpower Demonstration Research Corporation.

Bloom, H. S., Bos, J. M., & Lee, S. (1999). Using cluster random assignment to measure program impacts. Evaluation Review, 23(4), 445–469.

Raudenbush, S. W. (1997). Statistical analysis and optimal design for cluster randomized trials. Psychological Methods, 2(2), 173–185.

Q: What are the statistical implications and advantages of this approach?

A: Steve Raudenbush: In terms of sheer statistical power, by which I mean having the maximum probability of finding a treatment effect, we might actually prefer a design in which we randomly assign individuals. If we could pull it off, we’d have more degrees of freedom with individual assignment than with groups.3 Howard and I have both shown that the number of groups that have to be randomized to demonstrate a particular effect of the intervention is sometimes daunting. In many cases we see that 25 or 30 schools would need to be assigned at random to each treatment condition, and this prospect scares some people. Even though group randomization has no real power benefit, it does have another statistical kind of benefit.

Generally, in a randomized experiment, we assume that the effect of the treatment does not depend on certain extraneous factors. For example, if I participate in a health study and am randomly assigned to use Drug A, presumably it doesn’t matter which doctor gives me the drug—what’s important is that I take the drug in the manner prescribed. But now consider an analogous example from education. The assumption that the effects of an instructional method (the “drug”) doesn’t depend on who is teaching (the “doctor”) is really quite implausible, because teaching style is regarded as critical to kids’ outcomes.

This same idea could carry over to youth development programs. Adults who provide leadership in youth development, in general, are going to be very important when it comes to assessing outcomes; therefore, randomized grouping at the participant level alone would lead to false outcomes.

Howard Bloom: Building off of that, there are two points I would add. First, it is very important to do the impact analysis in accordance with how randomization was conducted. Some folks haven’t quite figured that out. There have been studies that have randomized groups, but analyzed their data as if they had randomized individuals. So when these studies calculated the statistical significance of their findings, they grossly overstated that significance. People conducting this kind of research should make sure that the analytic model they use to estimate impacts is consistent with the randomization approach they implement.4

A second and related point is that in most cases, the number of groups that are randomized is far more important than the number of individuals per group in determining how much precision, or power, you have. That’s why Steve was saying that you often need 25 or 30 groups per treatment condition, regardless of how many individuals there are per group.

Q: What are the implications of this approach for future evaluation and research, specifically in after school and youth development?

A: Steve Raudenbush: After school programs are clearly a group-oriented treatment, so they come very much into this domain. But, one of the things that must change for this movement to be successful is that people who run schools and after school programs have to get used to the idea of participating in a randomized experiment. That notion may seem implausible, but we have to get more clever and thoughtful about how to do this type of experiment.

In a recent randomized trial of school-based interventions, the original design involved some schools getting an exciting new instructional approach, and other schools getting basically nothing. That was not a successful approach—people didn’t want to participate in the study. So in a further refinement of that design, everyone got a treatment, but some groups were randomized to get it in one grade and others would be randomized to get it starting in another grade. The schools’ principals knew that whatever the result of the coin flip, in this modification, they would benefit from some new, potentially interesting attempt to solve a problem. That strategy was much more successful in recruiting people.

Howard Bloom: Building a culture that accepts randomized experiments is critical. You need to build a constituency for “effectiveness information,” understanding that there are multiple ways of measuring effectiveness of interventions, but randomization is ultimately the best. However, it will take a lot of time to build this constituency. For example, in the fields of employment and social welfare research, it took several decades for MDRC and other organizations, and folks in the federal government, to demonstrate that these kinds of studies could be done and should be done, and that their findings can play an important role in policymaking. In education, there is a strong push to bring more of this methodology in, but it will take a while to build a culture and a constituency for it.

Finally, it is important to understand that in any field of social science research you have to “pick your shots” when trying to use randomized experiments. You have to be careful and strategic—you simply cannot use a randomized experiment every time you want to answer an impact question. The question has to be important enough, with a large enough constituency behind it, to make the effort required to successfully conduct a randomized experiment worthwhile.

1 Group randomized trials are experiments in which some groups are randomly assigned to receive an intervention while others are assigned to a non-intervention control condition.
2 Lindquist, E. F. (1953). Design and analysis of experiments in psychology and education. Boston: Houghton Mifflin.
3 The number of degrees of freedom in an experiment refers to the number of parameters that can be independently varied.
4 For a clear discussion of this issue see Cornfield, J. (1978). Randomization by group: A formal analysis. American Journal of Epidemiology, 108(2), 100–102.

Priscilla M. D. Little, Project Manager, HFRP

‹ Previous Article | Table of Contents | Next Article ›

© 2016 Presidents and Fellows of Harvard College
Published by Harvard Family Research Project