Measurement & Evaluation

Cracking the Code on Social Impact

An emerging measurement approach emphasizes the value in standardizing outcomes for the whole social sector.

Over the past few decades, practitioners, evaluators, and academics have struggled to organize, measure, and understand social change. We have made a number of important advances, including more rigorous control studies, digitization of 990 data, outcomes tracking software, and improved reporting. One important challenge evaluators have faced in the social sector is standardization: How can we learn from past efforts if we cannot systematically compare one socially focused program to another? Researchers have tried to solve the “apples to oranges” problem in a number of ways. In the 1980s, the Urban Institute’s National Center on Charitable Statistics (NCCS) created a common code for classifying nonprofit organizations by entity type, and later created another system to classify program services and beneficiaries. Others have tried to standardize performance metrics using “shared measurement systems” such as IRIS and the Cultural Data Project. Still, these efforts fall short of codifying the true results of an organization’s programmatic efforts: outcomes.

While much of the focus on outcomes has centered on trying to measure them idiosyncratically, one organization at a time, I believe—and our work at Mission Measurement is demonstrating—that the real value is in standardizing the use of outcomes for the whole sector. Here’s why this approach works:

  • First, standardizing outcomes enables us to organize and identify social programs in a more meaningful way. Instead of researching programs based on subject area (for example, education, youth development, or the arts), we can base research on the benefits programs aim to produce (for example, improving college readiness, increasing access to public services and supports, or encouraging artistic expression).
  • Second, common outcomes create a universal common denominator for benchmarking and comparison—something the sector has long sought. So we can now apply measures such as cost per outcome and social return on investment, and we can do this on an ever-increasing scale, as we standardize and universally adopt efficacy rates.
  • Finally, program design and learning is much more efficient. Researchers can more easily identify program elements that increase efficacy and cross-sector synergies that conspire to produce outcomes.

My thinking about this approach to measurement—what I call “universal outcomes taxonomy”—dates back to my 2004 book Benchmarking for Nonprofits and a subsequent effort I led at the Urban Institute and the Center for What Works to create a prototype for classifying social outcomes. Since then, we have carefully documented more than 78,000 outcome data points (78,369 to be exact) from more than 5,800 social programs. Two years ago, we focused a team of researchers on the daunting task of systematically cataloguing those outcomes—removing duplicates, standardizing language, creating hierarchies, and developing a universal taxonomy. Not surprisingly, we found that many of the outcomes were the same, though articulated differently. For example, one organization might state its objective as “student achievement,” another as “academic achievement,” and another as “improving test scores.” In all, we identified 132 common outcomes across the entire social sector. We then indexed these outcomes by program type and sub-type, and classified them into a functional taxonomy (see below). We also added geo-coding and beneficiary codes to better contextualize outcomes.


But while it is becoming clearer what value-seeking funders want, the problem remains that we still face a number of barriers to producing these data.

Today, a growing number of funders across the country are using this framework to organize, or “tag,” their grants and programs, making it possible to analyze their work on a portfolio level. They can measure the resources they have allocated to each outcome and the relative contribution of each program to a particular outcome, and aggregate the overall performance of the portfolio.

We believe that this new language holds great promise for the sector, and we have much work to do. First, we must enable widespread adoption of the taxonomy—by both funders and social service providers. Second, we need to build the capacity of organizations to select the right outcomes (what we call “sizing the outcomes”). And finally, we must continually curate and improve the taxonomy, with feedback from practitioners and researchers.

Tracker Pixel for Entry


  • BY Emma Tomkinson

    ON February 6, 2014 05:37 PM

    Hi Jason,

    Looks great - can we see your taxonomy in its entirety somewhere?

    Big Society Capital in the UK has a similar project - it would be great to see how similar the two are.

    Also, the Global Values Exchange is a database of values with relationships between outcomes and indicators - as it’s open source it has not been systematically constructed, but is growing and refining as it goes.



  • Kieron Kirkland's avatar

    BY Kieron Kirkland

    ON February 7, 2014 09:10 AM

    Hi Jason

    It’s really interesting to read about this work, and it’s heartening to see it developed from the ground up.  As Emma has noted, it certainly echoes a trend at the moment for shared measurement in the sector.  In some ways shared measurement can be really useful, but I feel a ‘one size fits all approach’ can be very problematic for a number of reasons, particular for early stage and developing interventions. 

    A good example of this is in your example how ‘one organization might state its objective as “student achievement,” another as “academic achievement,” and another as “improving test scores.”’  I am not sure educators would equate student achievement with test scores, indeed many would argue that test scores are just one proxy of many. If you’re focused on getting young people into further education, test scores will certainly be a priority, however if you’re focused on their emotional well-being, it may be a less useful representation of success than other measures. In this way I’d argue that metrics represent and reinforce specific types of social value and theories of change.  That’s fine if we all agree on 1 theory of change, but we don’t. 

    A second challenge of standardizing measurement is aptly described by ‘‘Goodhart’s Law’ that putting an indicator in place makes people perform to that indicator and it loses it’s value, people will often game the system to achieve this.  Related to this standardized measurement can skew organizations missions and end up making them perform to external measures rather that using measures which represent the actual social value they are trying to generate.  This is particularly important in early stage approaches as they are testing new models which may be trying alternative ways of addressing social challenges. As such bullish application of shared measures can hamper finding and testing new models and approaches as they are deemed a failure before they have a chance to develop their model. It can also skew what we see as success, Perhaps the greatest example of that in our times is the use of GDP as a measurement of development.  Which many have argued, far more eloquently than I could, is fundamentally flawed.

    Thirdly standardized measures risk putting the focus of measurement on external accountability and not for internal learning and development of practice.  If we just focus on external accountability for measurement we are unable to create more nuanced metrics to ensure the continual development of alternative approaches.  As mentioned above this is particularly important when developing new innovations.  Also it reinforces that measurement is done for others and accountability, not for insight and development.

    Fourth, standardized approaches cannot account for factors we haven’t understood yet. This is especially difficult when scaling or growing as we encounter novel situations in new and dynamically changing environments.  Measuring only for what we know limits our ability to adapt or sometimes even see these unaccounted for factors. This can lead poor or inappropriate implementation. 

    If we have a deep understanding or the intervention, the context it’s operating in, and the social issue, we can use standardized measurement.  The problem is that with many of the challenges we face, we’re just not there yet. I often used shared measurement frameworks in my own work, but as a sector we must avoid leaping on it as a panacea to the ongoing problem of understanding ‘what works’.

  • BY Matthew Pike

    ON February 8, 2014 06:31 AM

    Hi Jason,

    This looks excellent, a big step forward in the development of a shared language for rich (not reductive - as per Kieron’s point above) conversations about change and value.  Certainly a big advance on projects like IRIS, for all their undoubted merits.

    We’ve been working on something reassuringly similar here in the UK.  We use the same headline categories listed here (different language - we are Brits after all! - but the same constructs).  We’d add a range of other aspects of taxonomy covering for example activities, critical factors, evidence and fidelity.  It would be excellent if we could compare notes.  We’ve built an open source, free to use data platform using these standards, which is out soon.

    What I like most about your approach, apart from it being developed ground-up is that it encompasses a range of perspectives about value and a range of uses to which data can be put: comparing performance, driving optimum asset allocation, learning how to improve, sharing evidence based practice etc etc.  In this respect it moves beyond approaches such as SROI or the Big Society Capital outcome matrix cited by Emma above, which tend to privilege specific uses of data e.g. Valuation or portfolio benchmarking.

    It would be fantastic if we could find some way of building shared language beyond national boundaries, above all to accelerate shared learning about what works better.

  • Jason Saul's avatar

    BY Jason Saul, MM

    ON February 9, 2014 11:16 AM

    Appreciate the responses.  It’s exciting to see all the synchronous thinking in the field. A few clarifying points about our research and its implications:

    • One vs. Many. There are a number of quality organizations keeping lists of commonly used outcomes, metrics and other data points.  For example, some of the sources other commentators have referenced offer great resource lists.  In a 1.0 world this is quite valuable as we need to seed the field with new ideas and compile reference lists for practitioners to access.  The value of creating a universal taxonomy, however, is to bring the field to some level of standardization. To achieve standardization, we as a field need to adopt a single, universal, definitive, unbiased catalogue of outcomes by which to organize our work.  And that catalogue must to be continuously curated, normalized and made useful to funders and practitioners. To that end, we will be making the Outcomes Taxonomy available to funders through a variety of channels and intermediaries beginning in Q2 of this year. 

    • Outcomes v. Metrics v. Activities.  We often tend to conflate these distinctive concepts, and I think this makes standardization even more elusive.  The purpose of our research was to curate-out those differences and develop a taxonomy that was purely focused on outcomes. Outcomes are directional changes in social condition.  Metrics or indicators are easily conflated with outcomes, but are not the same.  Things like “level of physical mobility” “time spent in hospital” or “number of people trained” are metrics.

    • Outcomes v. General Conditions.  To be universal, and standardizable, the outcomes included in a taxonomy must be both discrete and measurable.  Oftentimes practitioners will try to phrase general conditions like “Improvements in policy and legislation” or “economic performance of a local area” as outcomes.  But these conditions tend to be vague (e.g. not discrete) and open-ended (e.g. non-measurable) and therefore not well placed within a taxonomy.

    • Agnostic to Theory of Change.  A taxonomy is simply a list of outcomes.  The judgment comes in the user’s selection or valuation of one outcome versus another.  Whether you value “emotional well-being” or “test-scores” as a factor in driving up graduation rates, the point is that we are all aiming at the same shared “outcome” of improving graduation rates.  The taxonomy doesn’t relate to the way in which you intend to produce that outcome.

    • Outcomes v. Innovation. People often conflate “outcomes thinking” with measurement.  These are very different things.  Simply clarifying the desired benefits of an intervention does not quash or inhibit an organization’s ability to innovate strategies that produce that benefit.  In fact, quite the contrary – it opens the door to a wide range of strategies because we are simply controlling for the outcome, not the theory of change, the metrics or the type of intervention.


  • BY Matthew Forti

    ON February 9, 2014 08:32 PM

    Terrific post, Jason - very exciting to see how far the taxonomy has progressed and I believe this has great potential for the sector!

    As U.S. Director for One Acre Fund, an East-Africa based NGO doubling the incomes of hard-working smallholder farm families though a comprehensive bundle of farm services, I see two exciting uses for such a taxonomy in our context:

    1) Coalescing similar programs on a meaningful end outcome:  agriculture programs (whether NGO, government, or otherwise) measure and report on lots of important outcomes, such as improved farming techniques and improved access to market prices.  But at the end of the day, these are just intermediate outcomes and what ultimately matters is how much incremental farm profit (per donor dollar invested) the program creates - ideally measured against a rigorously selected control group.  A taxonomy like this that could push funders to direct all grantees working with a similar target population and intervention to focus on a common end outcome (and of course, end outcome per cost) would be a boon for the sector!

    2) Defining a long-term life improvement outcome bundle:  at One Acre Fund we’re interested not only in showing how our comprehensive bundle is the best way to increase farm profit, but also in proving that putting more income into the hands of Africa’s poorest is the best way to solve a variety of challenges poor people face, in areas like child nutrition, housing, school attendance and achievement, etc.  But we struggle with how best to measure quality of life improvement - and since I imagine many NGOs are in the same boat - we will probably end up with a hodgepodge of approaches.  It would be fascinating to see how this taxonomy might be used to create a life improvement outcome bundle for NGOs operating a variety of programs in global development to coalesce around. 

  • This type of resource is a critical step to help us move away from evaluating nonprofits on their overhead to evaluating nonprofits on their impact.  Nonprofit staff members, Board, and funders will all be relieved when we can start to better understand the relative impact of our efforts and make big bets on programs that really work.

    Congrats Jason and Mission Measurement on all you’ve done so far!

  • Standardising outcomes for the whole social sector is the wrong answer to a difficult question. It confuses statutory side responsibilities and mindsets with the ways in which people people in the third sector engage with inequality, address challenges with innovation and undertake social action – and maintain the energy and motivation to do so in testing circumstances. 
    By all means feel free to apply standardisation within a funded programme or across a range of providers who are contracted to deliver a public service. But beyond that you’re really just going to create a lot of grief.
    All the points that Kieran has raised so politely apply here. There are so many uncontrollable variables that gaming will pop up all over the place – just as it has in SROI, making it a largely discredited methodology.
    Example: you’d think that something as classifiable and objective as outcomes from surgery would be easy enough to measure and compare, but apparently the Royal College of Surgeons are having some difficulty

Leave a Comment


Please enter the word you see in the image below:


SSIR reserves the right to remove comments it deems offensive or inappropriate.