Search Publications | The Evaluation Exchange | FINE Network | OST Database

The Harvard Family Research Project separated from the Harvard Graduate School of Education to become the Global Family Research Project as of January 1, 2017. It is no longer affiliated with Harvard University.

Volume XI, Number 3, Fall 2005

Issue Topic: Democratic Evaluation

Promising Practices

Evaluating Evaluation Data

Kathleen McCartney and Heather Weiss of the Harvard Graduate School of Education describe the conditions for evaluations to maintain scientific integrity and serve the public good despite a politicized environment.¹

Evaluation is a powerful tool in decision making about policies and programs for children and families. When conducted within the established rules of the field, evaluation strengthens democratic principles about the public's right to know and enables us to base our deliberations about policies and programs on accurate information. However, social programs are proposed and funded through a political process and implemented and evaluated in a political climate. How then can evaluation be designed to maintain its integrity and serve the public good?

Evaluation data will always be evaluated through a political lens. But based on the past 40 years of evaluation of social programs, ranging from Head Start to DARE, we can identify at least five rules of evidence to which social scientists and policymakers can adhere to reduce the politicization of data. We also argue that social scientists and policymakers need to operate within a larger frame that emphasizes innovation and continuous improvement.

The Five Rules to Reduce Politicization of Data

1. Use mixed-method designs. In this climate of scientifically based research, it is important that experimental work include both main effects research questions, which pertain to universal program outcomes, and moderating effects research questions, which examine how effectiveness varies as a function of the characteristics of children and families. It is also essential to use qualitative methods, which yield valuable information about processes at work, in addition to quantitative methods.

2. Interpret effect size in a research context. Statistical effect size is a method for determining whether a program leads to meaningful differences in participants' lives. Effect sizes are influenced by a number of factors, including measurement and design. Moreover, there are no easy conventions for determining the practical importance of effect sizes. In medicine, researchers have tended to embrace even very small findings, while social scientists generally prefer to discuss findings with moderate effect. Researchers need to evaluate effect sizes in the context of the measures and designs used in the research—for example, by comparing a program effect with the effect of maternal education, a generally accepted predictor of child outcomes.

3. Synthesize all research findings. Sponsors and stakeholders often desire to bring the results of recent studies, especially large-scale ones, front and center on the policy stage. Yet a single study provides only an approximate estimate of intervention effects and may have little to say about the relation between the effect and features of the program. When a large literature on a given topic exists, then findings across studies can be brought to bear on a given question through balanced reviews of the literature; in short, more data means more knowledge.

4. Adopt fair and reasonable scientific expectations. It is important to have fair and reasonable scientific expectations of the extent to which the data can inform a debate or direct a political decision. For example, Lipsey found that program differences (e.g., treatment type, dosage, client type, and outcome) accounted for 25% of the variation in observed effects, while method, sampling error, and residual variance accounted for the remainder.² From this analysis, it is clear that we need to distinguish program effects from methodological effects. Although evaluation data can advance an argument, one must not overstate what the data can contribute to any debate.

5. Encourage peer and public critique of the data. Philosophers of science evaluate a work in light of the work's impact on the scientific community—the greater the impact, the greater the work's value. In order for a work to have a measurable impact, findings must be widely disseminated and subjected to professional scrutiny by peers and the public. Scrutiny is especially important amidst political debates about the meaning of data. Similarly, it is important to encourage data sharing to promote multiple independent analyses of evaluation data, especially where major policy issues are at stake and especially when the data were generated by public funds.

In addition to rules for interpreting data, we argue that a second condition must be met for evaluation to have a meaningful role in a democracy. Specifically, data need to be directed toward innovation and continuous improvement. The American approach to evaluation has been piecemeal, with relatively little emphasis on reflective learning. Today's emphasis on accountability has the potential to ameliorate this tendency—if we begin to think of evaluation as a key component in a larger, ongoing system of change. That is, evaluation helps a program clarify its theory of change and use data to gain new knowledge and also helps a program to apply this knowledge to make midcourse corrections in program design and implementation. Such a system has the potential to engage different—and especially underrepresented—stakeholders in the evaluation proc-ess and to shape policies and programs that are more representative of the citizenry.

¹ This article summarizes McCartney, K., & Weiss, H. B. (in press). Data for democracy: The evolving role of evaluation in policy and program development. In J. L. Aber, S. J. Bishop-Josef, S. Jones, K. T. McLearn, & D. D. Phillips (Eds.), Child development and social policy: Knowledge for action. Washington, DC: American Psychological Association. Expected publication date is July 2006. To be notified when the book is available, you can sign up for HFRP's e-news email.
² Lipsey, M. W. (1997). What can you build with thousands of bricks? Musings on the cumulation of knowledge in program evaluation. New Directions for Evaluation, 76, 7–24.

Kathleen McCartney, Acting Dean, Harvard Graduate School of Education

Heather B. Weiss, Founder & Director, HFRP

‹ Previous Article | Table of Contents | Next Article ›

Evaluating Evaluation Data

Quick Links