Search Publications | The Evaluation Exchange | FINE Network | OST Database

The Harvard Family Research Project separated from the Harvard Graduate School of Education to become the Global Family Research Project as of January 1, 2017. It is no longer affiliated with Harvard University.

Volume VIII, Number 1, Spring 2002

Issue Topic: Family Support

Spotlight

Evaluating Effect Sizes in the Policy Arena

Kathleen McCartney and Eric Dearing from the Harvard Graduate School of Education provide an overview on effect size and what it reveals about the effectiveness of family support programs.

Long ago, in a political climate far, far away, statistically significant findings were enough to justify a program’s effectiveness. This is no longer the case, and for good reason. Statistical significance tells you nothing about the size of the effect - that is, whether the program leads to meaningful differences in participants’ lives. From significance test statistics, like t or F or r, researchers can compute effect size estimates in standard units, which tell more about the practical importance of interventions. The problem is that there is no agreement on how to interpret effect size estimates. As a result, many will read the results of the National Evaluation of Family Support Programs (Layzer, Goodson, Bernstein, & Price, 2001) and decide that the glass is half empty, while others will conclude that it is half full.

What Is Effect Size?
There are two kinds of effect size estimates: r, the Pearson product moment correlation, and d, which denotes the standardized difference between two means. Cohen (1977), a statistician, offered some conventions for effect sizes to help researchers conduct power analyses. With respect to d, he suggested that .20 was small, .50 was moderate, and .80 was large. However, social science research seldom yields effects as large as .80. If we apply Cohen’s guidelines for power analyses blindly, we would end up dismissing most effects as small - even trivial. There is good reason to believe that this is not the case.

The average effect size for the effect of psychotherapy, measured across many studies, is .32. Some dismissed this effect as unimportant; after all, it represented a mere 1/3 standard deviation change in mental health. Rosenthal (1994) argued that this result should be examined in the context of other established health findings. To make his point, he asked readers to consider that the effect of aspirin in reducing heart attacks is .03! Many physicians prescribe this preventative treatment to their middle-aged patients, in part because the cost of the intervention is so small, while the benefit is potentially great.

Similarly, some scholars have dismissed the effect of child-care quality on children’s development as small; however, the National Institute of Child Health and Human Development Early Child Care Research Network (1999) demonstrated that child-care effects were about half the size of family environment effects when similar quality measures were used. In this context, it would be difficult to argue that child-care effects are trivial.

Related Resource

Layzer, J., & Goodson, B. (2001). National Evaluation of Family Support Programs. Cambridge, MA: Abt Associates, Inc. Prepared for the Federal Administration for Children, Youth, and Families, this evaluation answers questions about the effectiveness of a variety of types of family support programs for different types of families. The evaluation is a meta-analysis of 665 studies, with experimental and quasi-experimental studies analyzed separately. www.abtassoc.com

The Effect Size of Family Support Programs
There is likely to be much argument concerning the meaning of the effect sizes from the National Evaluation of Family Support Programs (Layzer et al., 2001), a remarkable study in scope. Its authors identified 665 studies that represent 260 programs, and for each study they computed effect size estimates for nine possible outcomes. The short-term average effect sizes across studies, for these outcomes were as follows: .29 for child cognitive development, .22 for child social and emotional development, .12 for child physical health and growth, .21 for child safety, .23 for parenting attitudes and knowledge, .26 for parenting behavior, .19 for family functioning, .14 for parent mental health, and .10 for family economic self-sufficiency. The long-term average effect sizes decreased for some outcomes, but increased for others. Interestingly, the long-term effect size for family economic self-sufficiency was .39, an important finding given the significance of increases in economic resources for children in poverty (Dearing, McCartney & Taylor, 2001).

Clearly, family support produces a range of effects - with those for child cognitive and social development as well as parenting attitudes and behavior among the largest. Importantly, "almost every program or intervention asserted the twin goals of improved parenting (98%) and enhanced child development (91%)" (p. 2, Layzer et al., 2001). As such, one would expect greater effects in these two domains than in other domains, such as child health. And this is exactly what was found.

Average effect sizes inform the question of whether family support programs work in a very general sense. A more useful question is under what circumstances do programs work best? Because the evaluation research team coded program characteristics and related them to effect sizes, we can answer this question. The influence of program characteristics on effect sizes within randomized studies, which provide the best data, was considerable. With respect to children’s cognitive development, programs were more effective when they included an early childhood education component (.48 vs. .25), when they were targeted to special needs children (.54 vs. .26), when there were peer support opportunities for parents (.40 vs. .25), and when there were parent groups rather than home visits (.49 vs. .26). For children’s social and emotional development, programs were more effective when parent self development was a program goal (.56 vs. .25) and when professional staff were used rather than paraprofessionals (.43 vs. .27). Parenting effects were also moderated by program characteristics, such as peer support.

Measuring Up the Effect Size of Family Support
It is likely that data from this evaluation will be used to support a variety of policy agendas. In fact, McCartney and Rosenthal (2000) have argued that data, like Rorschach inkblots, sometimes appear to serve as a projective test. Are these effects small? Evaluating effect sizes is not straightforward. As McCartney and Rosenthal (2000) write, "There are no easy conventions for determining practical importance. Just as children are best understood in context, so are effect sizes" (p. 175).

We suggest that you reflect on three points that help to provide context here. First, remember that the effect of psychotherapy is .32. Second, note that the families received support services for 15 months on average - a relatively short amount of time. Third, and perhaps most importantly, consider that effect sizes varied as a function of program characteristics, and that effects doubled with best practices. For us, both as researchers and as taxpayers, the glass is more than half full. Data from this evaluation will be of great use in guiding future family support program development.

Kathleen McCartney
Eric Dearing
Harvard Graduate School of Education
13 Appian Way
Cambridge, MA 02138
617-496-1182
kathleen_mccartney@gse.harvard.edu
eric_dearing@gse.harvard.edu

References
Cohen, J. (1977). Statistical power analysis for the behavioral sciences (Rev. ed.). New York: Academic Press.

Dearing, E., McCartney, K., & Taylor, B. A. (2001). Change in family income-to-needs matters more for children with less. Child Development, 72, 1779-1793.

Layzer, J. I., Goodson, B. D., Bernstein, L., & Price, C. (2001). National Evaluation of Family Support Programs, Final Report Volume A: The Meta-Analysis. Cambridge, MA: Abt Associates.

McCartney, K. & Rosenthal, R. (2000). Effect size, practical importance, and social policy for children. Child Development, 71, 173-180.

NICHD Early Child Care Research Network (1999, April). Effect sizes from the NICHD Study of Early Child Care. Paper presented at the Biennial Meetings of the Society for Research in Child Development, Albuquerque, NM.

Rosenthal, R. (1994). Parametric measures of effect size. In H. Cooper & L. V. Hedges (Eds.), The handbook of research synthesis. New York: Russell Sage Foundation.

‹ Previous Article | Table of Contents | Next Article ›

Evaluating Effect Sizes in the Policy Arena

Quick Links

Related Resource