Measurement & Evaluation

The Mismeasure of Impact

Data scientists and information economists are beginning to pair with social innovators to understand the dynamics of interventions and to separate what works from what doesn’t.

This post is from PopTech Editions III—"Made to Measure: The New Science of Impact," which explores the evolving techniques to accurately gauge the real impact of initiatives and programs designed to do social good. Visit PopTech for more interviews, essays, and videos with leading thinkers on this subject.

How often has some version of this story happened:

A group of young, eager innovators come together to develop a new, promising approach to one of today’s “wicked problems”—in an area like climate change, poverty alleviation, food security, or off-grid energy.

With a mix of design and engineering prowess, good intentions and no small amount of luck, they develop a laudable prototype. This wins them breathless media attention, speaking invitations to conferences and perhaps a prize or two, followed by sufficient seed capital for a pilot.

The pilot shows promise; after the intervention, the relevant critical indicator (which might be a measure of market access, public health, etc.) shows marked improvement. On the strength of this happy outcome, more capital is raised. The intervention moves out of the pilot stage and is rolled out to the community. The press is breathless. Hopes are high.

And then, much to everyone’s chagrin: almost nothing changes. The new social innovation barely makes a dent in the problem, which appears more pernicious than ever.

What happened?

If you recognize elements of this story (or if you wince in self-recognition) you are not alone. This is the common fate of most social innovations, and it's the field’s dirty little secret: many of the most promising new approaches to tough problems fail, in ways that surprise and frustrate their creators, funders and constituents alike.

The reasons behind such failures are complex. The most common culprit is a kind of cultural blindness on the part of would be change-agents, who fail to design “with, not for” the communities they serve, and end up trying to impose a solution from without, rather than encourage its adoption from within. More generally, it's important to remember that wicked problems have earned that moniker for a reason—they are generally immune to “elegant hacks” and quick fixes that can be a hallmark of other endeavors, such as software development.

But there are other, deeper reasons why social innovations unexpectedly fail. They involve the many ways we unintentionally mismeasure the impact we’re having, and fool ourselves that a social intervention is working when it really isn’t.

"Marking on fieldwork locations" (Image: Richard Allaway)

The most common pitfall we encounter in measuring the impact a social innovation is failing to establish a control group. Without assessing a matched cohort that is not receiving an intervention, it is impossible to know what precise effect a social innovation is having.

For example, let’s say you develop an innovative literacy-improving program for children. You test a community of low-literacy subjects, then provide the intervention, and test them again. Their measured rates of literacy jump dramatically. Time to pop the champagne corks, right?

Wait a moment. Why exactly did rates of literacy improve? Was it your program? Or was it a natural byproduct of the maturation of the subjects? (Between the first and second tests, the children you tested got older—their independent cognitive development may account for the increase.) Or was it a practice effect of the test? After all, we tend to do better on tasks we’ve tried before. It might be the case that subjects simply got better because they’d seen this kind of test before.

Then again, perhaps we have run into a regression effect. These require a bit of additional explanation.

Many phenomena, like the temperature in a given month, or your bowling score, will cluster around an average. On some days, it may be moderately higher, on others moderately lower. But on average, these indicators will cluster around a central number—a “mean.”

Now, let’s imagine we take a group of subjects and give them a test, such as the baseline literacy test mentioned above. As with the examples above, most will score close to the mean, while a few will be outliers, scoring dramatically higher or lower. Given the same test again, with no additional intervention, it's likely that the subjects who were outliers in the first test will “migrate” closer to the mean, while some that were at the mean in the first test will “migrate” to the extreme high or low of the range in the second. This is a purely natural statistical artifact.

Now let’s temporarily assume, for the sake of argument, that the hypothetical literacy program we devised had an astonishing zero-percent effectiveness. We measure the baseline of the population; then we deliver this (useless) intervention; and then measure again, paying careful attention to those who did the worst on the first test. Amazingly, many will show marked improvement, “migrating” to the middle of the pack, though for reasons that have nothing to do with our literacy program.

"Measurement" (Image: Freddy Fam)

Even controlling for regression effects, there may be other phantoms lurking in our measurement. Placebo effects happen in social interventions just as they do in medicine. Some people who believe they’ve received an effective intervention may do better whether the intervention is actually effective or not.

Much more common, particularly in measuring social innovation initiatives is the problem of selective dropout. This occurs when the “users” of a particular intervention find it either too easy or too difficult, and stop participating. When that happens, the results of any subsequent analysis can be markedly skewed. Perhaps it's true that the average literacy rates of a particular classroom of students improved by 20 percent after the administration of our program, but it’s meaningless if 20 percent of the students found it too difficult and left the class altogether.

The inverse problem—a form of priming—is particularly common in social innovation and makes measurement difficult. This occurs when the measurement of an intervention suggest—often subconsciously—what the “right’” answers should be.

Finally, there are compensation effects that can occur when we change a social system. When we make cars safer, people may drive more dangerously, precisely because we made driving less dangerous. When we make cookstoves more efficient (and therefore more healthy and less polluting to use) people may use them more, offsetting the benefits of the efficiency.

All of these biases—sample maturation, practice effects, regression artifacts, placebo and compensation effects and countless others—can dramatically distort the perceived success of a particular intervention, often making it look much more effective than it actually is.

Does this mean we should just throw in the towel? Hardly. Social science and fields like medical research are replete with tools for designing effective impact measurement. Data scientists and information economists in particular are beginning to pair with social innovators to understand the dynamics of interventions, and separate what works from what doesn’t. Technologists are uncovering new ways to aggregate core impact data and make it open. Yet this work has little bearing on the kind of impact statements demanded by many funders today.

What we need now is a revolution in both the practice and culture of social innovation, one that recognizes that meaningful measurement is every bit as essential—and artful—as the interventions themselves, and bakes it in as a core component of the work. Otherwise, we may very well be wasting everyone’s time.

Tracker Pixel for Entry


  • Thanks for a great and timely piece Andrew - enjoyed reading it! 
    I can’t disagree with any of the points you’ve raised, but would like to add that maybe there is a step even before the process of understanding the attribution of any one initiative to the change on the ground.  How well are we listening?  How well do we really understand the problem we’re working on (you’ve started with this actually in your piece, where you say that sometimes the failure comes because we design for and with them (beneficiaries).  And this is where I’d like to chime in.
    I feel that if we’re able to have a better understanding of stories from the street so to speak, if we are able to better feel the pulse of the communities we work for, this incredible energy and amount of resources that is being poured into social innovation may be channeled in a far more effective way- it would be channel toward the problems as defined by the people who are experiencing them.  I think that most problem definition as it stands at the moment is an interpretation of experts, consultations, in and out of house colleagues that build individual and professional biases into this interpretation of a problem.  And with each bias, we are a step removed from the real problem. 
    How do we understand better the stories from the street?  We try and find the way to collect those stories at scale and understand them.  We at UNDP are experimenting with applying this narratives-approach to better understand some of those wicked problems, where we have loads of research but still are only scratching the surface (I wrote a post on this few weeks back
    We follow in some great footsteps of GlobalGiving that already use this approach to monitor the impact of the projects they support ( and recently the Nominet Trust in the UK (  I believe that with this additional step, the process of evaluation may be far easier to facilitate!

  • BY Andrew Zolli

    ON January 28, 2013 08:28 AM


    These are terrifically astute comments. There is no question that you cannot make effective change without understanding, and that there are deep structural cognitive biases that come into play when outside experts interpret the context, and outcomes, of various efforts to make change. To be effective, you have to have a street-level undertstanding of local culture, narratives, complexities, tradeoffs, and perspectives. In our work, that only happens when you have strong effective local champions who are full partners from the beginning.


  • BY Anna Visser

    ON January 31, 2013 03:12 AM

    Thank you for very interesting overview of the challenges.
    Have you looked at trying to measure the impact of social interventions which aim to change government policy/services, rather than directly providing the service themselves (e.g. nonproft lobbying).  Something we are looking at, many of the factors your raise are relevant, but interested to know if you have given it any thought?
    Anna Visser

  • BY James P. Scanlan

    ON December 15, 2013 11:25 AM

    These are sensible observations about caution in the interpretation of data.  But the discussion is premised on an assumption that standard measurement tools in the social and medical sciences are sound.  Yet how many researchers are aware that reducing the frequency of an outcome tends to increase relative differences in experiencing it while reducing relative differences in avoiding it?  That is, for example, how many recognize (1) that lowering a test cutoff will tend to increase relative differences in failure rates while reducing relative differences in pass rates, (2) that reducing poverty will tend to increase relative differences in poverty rates while reducing relative differences in rates of avoiding poverty, (3) that reducing mortality will tend to increase relative differences in mortality while reducing relative differences in survival, (4) that improving healthcare will tend to increase relative differences in non-receipt of appropriate care while reducing relative differences in receipt of appropriate care, or even (5) that reducing public school suspension rates will tend to increase relative differences in suspension rates while reducing relative differences in rates of avoiding suspension?[1-5]  Probably few researchers recognize that it is even possible for the two relative differences to change in opposite directions as the prevalence of an outcome changes, much less than they tend to do so systematically.  Efforts to appraise racial differences in cancer outcomes commonly refer to relative differences in survival and relative differences in mortality interchangeably, often measuring one while purporting to measure the other, and without recognizing that the two tend to change in opposite directions as survival generally improves, or that more survivable cancers tend to show larger relative differences in mortality, but smaller relative differences in survival, than less survivable cancers.[6-7]  Further, the rate ratio (with its corresponding relative difference) – or, more accurate to say, the rate ratio regarding the particular side of a dichotomy that the observer happens to be examining – remains the principal measure of association in the law and the social and medical sciences.  Yet, given that whenever rate ratios are equal for two different baseline rates for an outcome the rate ratios are necessarily unequal for the corresponding opposite outcome, the rate ratio is not only an unsound measure of association, but an illogical one as well. [8]

    That is not to say that one should throw in the towel.  But much rethinking of basic tools is warranted. 

    1. “Misunderstanding of Statistics Leads to Misguided Law Enforcement Policies” (Amstat News, Dec. 2012);

    2. “Can We Actually Measure Health Disparities?” (Chance, Spring 2006);

    3. “Race and Mortality” (Society, Jan./Feb. 2000) (reprinted in Current, Feb. 2000):

    4. “Divining Difference” (Chance, Spring 1994):

    5. “The Mismeasure of Group Differences in the Law and the Social and Medical Sciences,” Applied Statistics Workshop at the Institute for Quantitative Social Science at Harvard University,  Oct. 17, 2012:

    6.  Mortality and Survival Page of

    7.  Mortality/Survival Illustrations of Scanlan’s Rule Page of

    8.  Goodbye to the rate ratio.  BMJ Feb. 25, 2013 (responding to Hingorani AD, van der Windt DA, Riley RD, et al.  Prognosis research strategy (PROGRESS) 4: Stratified medicine research. BMJ2013;346:e5793):

Leave a Comment


Please enter the word you see in the image below:


SSIR reserves the right to remove comments it deems offensive or inappropriate.
Measurement & Evaluation

Practice Safe Stats! A PSA

Featuring Jake Porway 1

Listen to Jake Porway, founder and executive director of DataKind, talk about how to “practice safe stats”—that is, how to create data visualizations that are both accurate and inspiring.