This post is from PopTech Editions III—“Made to Measure: The New Science of Impact,” which explores the evolving techniques to accurately gauge the real impact of initiatives and programs designed to do social good. Visit PopTech for more interviews, essays, and videos with leading thinkers on this subject.

Randomized evaluations have made waves in the social sector for good reason. They contribute much needed evidence to inform public policy and to drive objective NGO approaches. But one must be attuned to the details of an evaluation to understand what it is really telling us. We often forget this. Sometimes we discard evidence that tells us something valuable. More often we overgeneralize results, drawing broad conclusions where it is inappropriate or impossible to do so.

Most of the discussion around interpreting results centers on what statisticians call “external validity,” a term that refers to the general relevance of findings. Typically, critiques about the external validity of randomized evaluations focus on geographical context: “How do we know that an intervention in Kenya will yield the same results in India?”

But external validity considerations should extend beyond geography. To interpret results, it is crucial to understand for whom the treatment effect is measured and exactly what are the details of the intervention.

Community-based savings bank in Cambodia (Image: Wikipedia)

Consider microcredit as our main example. An evaluation of a microcredit program might investigate, “Does access to microcredit alleviate poverty?” But designing a rigorous study to answer this question requires researchers to probe into deeper sub-questions: “For whom does microcredit alleviate poverty?” and, “What kind of microcredit product?”

Let’s start with the first question. Who is the target population for a microcredit intervention? Your answer will depend largely on where you sit: Academics and microfinance institutions will be interested in different groups of people. Development economists want to understand the impact of microcredit to push forward their theoretical understanding of poverty. Thus, the appropriate population for an academic economist comprises all individuals below the poverty line.

But let’s say you’re a microcredit lender who wants to understand the impact of your product, which has extensive reach in a single country. If your organization has a competent management team, you didn’t grow your operations randomly. You filtered the universe of options based on some set of criteria, such as the population of target clients or the competitive landscape. If this microfinance institution were to evaluate its impact, it would select among locations where it might desirably operate.

An academic might object to the institution’s evaluation due to selection bias. But the lender isn’t trying to answer the broader questions that interest academics. It is perfectly valid for a lender’s trial to demonstrate that its products only help people with the specific characteristics that make for a desirable client.

The relatively narrow evaluations by lenders can still provide valuable information. It should come as no surprise that some interventions work better on some poor people than others. NGOs should want to know this and grow their programs accordingly. It is important then to understand which target population provided the sample for a study. You can then determine how much that population looks like the one that interests you.

Equally important is the question of exactly what is measured. Think for a moment about the variability of microcredit product features. Does a microcredit product involve group lending? Is use of the loan restricted? What is the payback period? What is the interest rate? All of these questions affect the impact of the program. But a randomized evaluation of microcredit by definition investigates the effectiveness of one specific set of product features.

Take for example the Abdul Latif Jameel Poverty Action Lab’s (JPAL’s) landmark study of microcredit in Hyderabad. JPAL worked with Spandana, one of the largest MFIs in India. Spandana’s microcredit product was a traditional one: group loans for 6 to 10 women aged 18 to 59 who self-selected their groups. Loans could be used for anything, but had to be paid back over a 50-week period and carried a 12 percent interest rate.

JPAL’s study showed that the program was mostly disappointing, at least relative to the dominant microcredit narrative. But that data is far from the indictment of microcredit that many observers jumped to. What would the results look like if the loans included individual liability? What if men could access the microcredit program? What if there was a longer payback period? We want to understand the impact of microcredit, but a trial can only measure impact from a specific product.

One needs to understand the who and what of rigorous evaluations for a host of interventions beyond microcredit. Randomized controlled trials are a critical tool for understanding impact and making strategic decisions. But to accurately interpret results, it is essential that we understand both the nuances of a study and the parameters of the underlying intervention programs.