Measurement & Evaluation

Most Charities Shouldn’t Evaluate Their Work: Part One

Why not?

Most “evaluations” of charities’ work are done by the charities themselves and are a waste of time. Perhaps this is a surprising view for an advocate who thinks that charitable work should be based on evidence—but it’s true because charitable activity should be based on good quality, robust evidence, which isn’t what many charities can reasonably be expected to produce.

What is evaluation?

Before we get into why this is true, let’s get clear about what evaluation is, and what it isn’t.

The effect of a charity’s work depends on the quality of the idea (intervention) it uses, and how well it implements that idea. Both idea and implementation need to be good for the impact to be high; and if either idea or implementation is low, the impact will be low. Think of it as:

impact = idea x implementation

To illustrate the difference, consider a breakfast club in a school for disadvantaged children. The idea is that a decent breakfast aids learning by avoiding the distractions of hunger. The implementation involves having foods that the children will eat, buying them at a good price, getting children to show up, and so on.

Assessing the quality of implementation is relatively easy. Do children come, what is their feedback on the breakfast club, how much of the food gets wasted, etc.? This is monitoring. It’s vital, and by rapidly providing feedback to staff, it enables the organization to improve its processes and improve them dramatically (as Bill Gates discussed in his annual letter). This monitoring (or “process evaluation”) should happen almost always, and the charity can normally do it themselves.

Notice, however, that monitoring looks only at the performance of that one organization, so it will never (on its own) tell you whether a charity is good relative to other places you might put your money—that is, whether funding that charity is a good idea.

Assessing the quality of the idea is rather harder. That involves investigating whether a decent breakfast actually does aid learning. And that requires isolating the effect of the intervention from other extraneous factors. This is impact evaluation, and “evaluation is distinguished from monitoring by a serious attempt to establish attribution,” says Michael Kell, chief economist at the (UK) National Audit Office.

It’s hard. In our example, we can’t look simply at whether the children who attend breakfast club are now learning better than before: Perhaps the club starting coincided with a new teacher arriving, or better books arriving, or children suddenly watching more Sesame Street. (Organizations could look at these factors in a pre/post study, and many charities do, but they prove little if anything.)

Neither can we look at whether children in breakfast club learn faster than those who don’t, because it’s highly likely that there will be major differences between those who come and those who don’t: Perhaps only the worst learners attend. To get around that, we’d have to do a randomised control trial. That throws up complicated questions such as whether to randomise children, or schools, or towns; and what sample size to have. But even then we’re not out of the woods. We might have to deal with “spill-over effects” (benefits to children who don’t go to club—for example, those who do go may be less disruptive in class); and “cross-over effects” (such as children who attend the club giving food to children who don’t go). These and many others are all normal questions in such research.

Hence, establishing attribution—which is integral to evaluating an idea—is a whole field of social science research.

Monitoring is of the implementation.

Evaluation is of the idea.

Immediately we see that monitoring and evaluating are totally different exercises, despite the fact that they are often used as though they are identical.

Most charities aren’t comprised of social scientists

It’s reasonable to expect charities to monitor their work: How many trains do you run, are they on time, and what do passengers think of them? As with companies reporting the number of units they’ve sold, we might audit those figures, but there’s not normally anything technically difficult in monitoring.

By contrast, evaluation is social science research, and it is hard: What effect do trains have on economic growth? Most charities aren’t able to run evaluations because they don’t have the skills in-house. We can see this in a recent review by the Paul Hamlyn Foundation of reports from grantees (possibly the only analysis of its type): 70 percent of evidence presented by charities ranked below what the foundation called “good.”

The good news is that most charities don’t need these research skills. Once we know whether and when breakfast clubs work—once they’ve been evaluated rigorously—then we know and don’t need to evaluate them again (unless the context is very different). To take a medical analogy, we don’t expect every hospital to be a test-site. The clinical trials are done somewhere—properly, with luck—and then published so that everybody else can use the results.

Often, the ideas used by charities don’t need to be evaluated again, because they’ve been amply evaluated already. Charities—and funders and others—can use those existing evaluations to choose effective interventions. All the charity then needs to do is run the programs well. Charities need to be skilled at implementation—at running breakfast clubs, or community transport, or drug rehab centres. By the long-established law of comparative advantage, we should let them do what they’re best at and not ask them also to get good at the totally unrelated skills of social science research.

Tracker Pixel for Entry


  • BY Paul Penley

    ON May 29, 2013 12:42 PM

    Beyond monitoring and short of evaluation, charities can identify baselines or benchmarks that provide meaningful context to what they monitor. Kids at breakfast clubs may get average grades. That could be an improvement or a decline in academic performance. Charities should capture where beneficiaries are at before reporting where they end up. It doesn’t make a conclusive case that the intervention caused improvement, but it does show donors if improvement is taking place. Some groups screen out low performers in applications or program requirements and end up with great stories from beneficiaries because they are picking winners not making winners.

  • Tom Skeele's avatar

    BY Tom Skeele

    ON May 30, 2013 06:10 AM

    Caroline…..  Thanks for your distinction between monitoring and evaluation, and making a good case for why one can be conducted internally and one should not be.  It all makes sense to me, and speaks to the long standing struggle I’ve had during my nearly 20 years as an ED of a nonprofit (I’m still getting used to the term “charity”) in believing I was really able to evaluate our impact.  My own M&E training was in the outputs/outcomes paradigm, which is what I’ve used for my orgs.  As you describe it, monitoring seems to be the process of identifying outputs, and evaluations outcomes.  I’d be curious to hear your thoughts on how your description of monitoring and evaluation is synonymous with measuring outputs and outcomes, and where/how they differ (or complement each other, or are variations on a them)?

  • BY Nell Edgington

    ON May 30, 2013 07:08 AM


    This was a very useful post that clearly, and simply, describes the difference (often misunderstood) between monitoring and evaluation. However, I think your post begs the question “how do we encourage and fund more evaluations of the ideas?” You make it sound so simple to leave the social scientists to come up with and evaluate the ideas and then disseminate those findings to the “charities” to execute and monitor, but I don’t see large, well-funded efforts to evaluate big ideas. Do you anticipate growing efforts there? In other words, once you’ve so clearly defined the problem how do we solve it?

  • BY United Way of Stanislaus County

    ON May 30, 2013 08:34 AM

    Thank you for sharing this post. This is definitely a thought provoking article.

    You do a great job of showing that “implementation” and “evaluation” are different activities requiring different skill sets. This is very true. Therefore, do you think it is okay that a charity spend money hiring someone to “evaluate” their work? If so, how much should be spent on this?


  • BY Catherine Elkins

    ON May 30, 2013 09:58 AM

    M&E doesn’t separate out that way in the real world. For instance, implementation quality is not necessarily easy to assess, and the illustrations do not assess it: I can get lots of kids to breakfast and get great ‘feedback’ by serving doughnuts and ice cream. That’s not quality implementation of an intervention to help kids concentrate and learn, but my numbers would be great! What’s described above is activity management—did we do the things we said we would do—which is neither M nor E.

    As another example, training events can look successful with per diem or free lunch incentives, and people will say the training was great if it was entertaining. But do they go back to their jobs or lives and do anything differently? The way we expected? To be useful, M&E must focus on what is meant to happen /beyond/ what we did.

    Thinking through the design and the changes we should expect to see along the way should already be embedded in planning the program itself. *Assessing* the value of the implementation therefore is inextricably intertwined with strong design and methods that include *assessing* the appropriateness of the model and the extent to which it is achieving its intended results.

    What smaller organizations may lack in technical skills they will make up in deep knowledge, typically of specific contexts that don’t map well to an academic study either constrained to different circumstances or generalized beyond relevance. (And if these organizations lack methodological sophistication, how could they discriminate among the many, many, many examples of poor evaluation design or execution, in order to find the ones that have actually produced valid and reliable evidence? And when would they do this time-consuming research?)

    Instead of following a path that essentially abdicates responsibility to assess results, implementing organizations will benefit much more from building capacity to strategically articulate, select, validate, strengthen, and use (internally and externally) program experience and facts as evidence. Sound M&E, in other words. Not a trivial set of skills, but no PhD required.

    Sound M&E is organizational self-defense. Social science (and donor) trends swing back and forth. Leaving it up to them to tell you if you’re making a difference would be negligence.

  • BY Caroline Fiennes

    ON May 30, 2013 11:25 AM

    Caroline Fiennes, the articles’ author here.

    Some of the comments are answered in Part Two, here:

    On how to get more funding, well, I don’t have a magic wand but do observe that a huge amount of money is spent on:
    - ‘M&E’ which doesn’t usefully evaluate the idea nor monitor the implementation but instead is just gathering data for the sake of gathering data.
    - donors requiring ‘reports’ which duplicate what other donors want, or are minor deviations of what other donors want. Those data are expensive to gather and generate no useful insight.
    - wasteful, unharmonised application processes.
    If we could stem any of these systemic forms of waste, we’d free up a ton of money which could be used for better evaluations - and/or for more implementations.

    On who does this research, that’s discussed in Part Two.

  • BY Daniel F. Bassill

    ON May 30, 2013 01:55 PM

    Thanks for the article and the comments.

    I’m trying to find in-dept discussions and innovations where people are trying to address Caroline’s last comment: “If we could stem any of these systemic forms of waste, we’d free up a ton of money which could be used for better evaluations - and/or for more implementations.”

    While this discussion may be taking place in small groups in different places can anyone point to open, on-line forums where donors, policy makers, NPO leaders, researchers, etc. are in the same discussion and looking for ways to generate needed operating resources to all of the organizations doing the same type of work, but in different places.

    There are a number of reports showing value of connecting a youth in poverty with adult mentors, thus, where are the donor coalitions who are working to make sure there is a flow of talent, technology and operating resources to enable programs to implement these programs in more places?  Can we build a list of such discussion forums?

  • BY Jeremy Nicholls

    ON June 4, 2013 03:12 AM

    Given the recent increase in interest in social impact and measurement, there is the need to take stock on where this is all going but nonetheless I think that the extent to which organisations monitor and evaluate (account for and analyse) the change they create should be driven by the need to be accountable, to the people on whose behalf they are acting and for spending resources in the most effective way -  and the level of rigour required will vary.

    RCTs are one way of testing causality and can give a level of confidence that may be more than is necessary but this doesnt mean organisations should not consider and manage causality.

    I responded to some similar issues although the language was different in a post to

    There is a real challenge in getting data that is used to make decisions and waste in ld generating information that isnt being used. The information should help answer the question ‘are we making as much of a difference as we can given the resources we have?’ and this means getting boards of directors and trustee to be as demanding of social value created as they are for financial management

  • BY Zak Kaufman

    ON June 13, 2013 11:38 AM

    Overall, a good explanation of how M and E are inherently different because they answer different questions. It is another inconvenient truth (to build on Karti Subramanian’s recent blog) that many organizations conflate M and E into an amorphous over-aggregated blob of Excel sheets and wind up not answering any useful questions at all.

    I have four points in response to your piece:
    1. Your Impact equation is really just an equation for Effectiveness. In the public health world, we can’t talk about the impact of an intervention without talking about its scale or the prevalence of the problem/disease being addressed. I would propose revising your equation to:
    Impact = Efficacy x Quality of Implementation x Scale x Need
    you could simplify to:
    Impact = Effectiveness x Scale x Need

    If you have a highly efficacious HIV prevention intervention, for example, you might Implement it very well, but only deliver it at a small scale or in a setting with low HIV prevalence (i.e. reducing incidence from 0.02% to 0.01%). Therefore, despite your effectiveness being high (you reduced HIV incidence by 50%!), your impact (# of HIV infections or DALYs averted) will be very small.

    2. Semantics: Evaluation is just as much science as social science, particularly if the intervention is health-related. Few would classify Epidemiology as a discipline in the Social Sciences; yet hundreds of epidemiologists (including myself) are focused on evaluating interventions’ effectiveness (in particular, using RCTs).

    3. More semantics: I don’t agree that Process Evaluation and Monitoring are the same thing, and I think you’re potentially creating confusion by conflating the two while simultaneously saying that monitoring and evaluation are “totally different”. Generally speaking Monitoring is ongoing data collection, analysis, feedback, adjustment; Evaluation—whether focused on outcomes or processes—is periodic, planned, systematic, and scientific. You use the term Evaluation when you mean Outcome Evaluation (or what many call ‘Impact Evaluation’...a problematic term because many “impact evaluations” only measure effectiveness and not impact). Saying that Process Evaluation isn’t Evaluation is confusing and inconsistent with lots of previous work in this area: simply put, Outcome Evaluation, Process Evaluation, and Monitoring are three different endeavors (see, e.g., Nutbeam and Bauman 2006). They require different skills, different tools, different systems and usually different people.

    4. Lastly, it’s key to stress the importance of implementers’ role in innovation. One might read your article and think that organizations should not bother coming up with ideas, that they should leave it to researchers to design the next best intervention and evaluate it. But many organizations understand the context of an intervention better than researchers and thus might have better (or at least equally good) ideas for new interventions or how to greatly improve existing ones—ideas that could be tested by solid outcome evaluations. Moreover, organizations have contextual understanding that is essential to designing strong evaluations. We should facilitate effective collaboration between implementers and researchers to come up with and test new ideas, rather than saying that “charities don’t need research skills”.

Leave a Comment


Please enter the word you see in the image below:


SSIR reserves the right to remove comments it deems offensive or inappropriate.