Measurement & Evaluation

RCTs: Not All That Glitters Is Gold

A look at the limitations of randomized control trials.

Globally, there is a push for evidence-based practice and policy in poverty reduction and development strategies. The randomized control trial (RCT) methodology, drawn from clinical trials, has in recent years been applied to issues as diverse as health practices, changing gender norms, and access to finance through field experiments. The rigor of this methodology allows scientists to move beyond correlations (x and y seem related) and make causal statements (x increases y), thereby leading many to regard it as the “gold standard” of evidence.

I recently attended the Urban Services Initiative matchmaking conference, organized by the Abdul Latif Jameel Poverty Action Lab (JPAL), where I met researchers and practitioners working on important aspects of poverty alleviation. Certainly our discussions were intellectually stimulating, and I learned a lot. But I found myself thinking more about the limitations of the RCT method than its value. I walked away with a few new reservations:

RCTs are context-specific.

Just because creating report cards for local politicians changed voter behavior in Delhi and urban Uttar Pradesh doesn’t mean it would do the same in Dhaka. Results are rarely universal, and program leaders are left with evidence that an intervention “works some places but not all,” and so fall back on making a judgment call.

RCTs tend to answer meta-level questions.

Example: A medical trial may aim to determine whether certain drug dosage combinations improve a given condition. The study will report whether that combination worked or didn’t work, but it won’t tell you how well other combinations (or single therapies) might work. Similarly, RCTs (particularly impact evaluations) offer answers to only the specific program design tested. At BRAC, we’ve tested the comprehensive bundle of goods, services, and social engagement that we provide to the ultra-poor. Our evaluation found that it is effective in reducing extreme poverty, but the research doesn’t tell us whether the program would be just as effective with fewer household visits from our staff, or whether adding other components would make a greater impact on education. Additional research, often qualitative, is needed to answer specific questions, including why the program worked better for some than others.

Or, RCTs answer small questions that are only part of the puzzle.

Some RCTs look at much smaller questions, such as: Should a toilet in a slum be community-operated or privately operated? The answer is quite specific and actionable. Few academics, however, have the patience and interest to answer all the questions that matter to a practitioner who is considering adopting a model.

RCTs are non-additive.

Simply, RCTs almost always yield pieces of unique puzzles, rather than answers that add up to a larger certainty. Even accounting for the caveats of context and program, RCT knowledge cannot be neatly combined with other study results. Even “gold standard” results, taken together, do not produce a full playbook.

Sometimes trends are clear across a number of RCTs, and occasionally meta-analyses bringing together several study topics in the same area are written to try to aggregate the information (and not without controversy). But usually, academics are drawn to the big unknown, not the practical questions that emerge in the wake of other studies. Little attention is paid to who implements the interventions and how their organizational characteristics contribute to the observed impact (or lack thereof).

RCTs don’t tell you why.

RCTs tell you only whether or not something works, and how well. Why it does or doesn’t work is, from the researcher’s perspective, up for interpretation, but for practitioners, it is critical to adopting a new practice. While results are often described as surprising, in reality, RCT findings rarely surprise the research team (though they may surprise others, as a recent SSIR poll indicated). Prior to launching a full study, most investigators have done a number of focus groups, analyses on existing data, and pilots to make sure that their predictions are well supported. A key input is the experience, intuition, and wisdom of practitioners. These types of due diligence are critical to finding funds and committing to a full-scale RCT (the “real” research). For practitioners, the insights from the pre-research phase, which are rarely shared formally, are the most useful and immediately applicable knowledge gained.

RCTs are an incredible tool for answering some types of questions and producing some evidence. However, they are a gold standard only to academics, not practitioners. The focus of research should be to shed actionable insights on how to achieve impact—in our case, poverty reduction—not methods. The push for the creation and utilization of evidence is positive, but not if it means marginalizing existing operational wisdom. Evidence has limitations and must be wedded with creativity, experience, and operational know-how to create and scale effective programs.

Tracker Pixel for Entry


  • Jacob model's avatar

    BY Jacob model

    ON August 28, 2012 02:01 PM

    So I mostly agree with your comments Maria and RCTs are really only appropriate in circumscribed situations.  However, there’s a few points I think I’d probably disagree on. 

    First, I think that RCTs are probably one of the only ways to definitely assess the mechanisms behind interventions - (i.e., how they they work).  Good RCTs don’t just collect quantitative data - there should be outlets for collecting more qualitative data to understand what’s happening.  Good practitioners of RCTs (like JPAL) do this routinely. The best RCTs have conditions that try to isolate different aspects of the mechanism precisely to identify what’s at play and what isn’t.

    Second, as you discuss with BRAC many RCTs of nonprofits just try to see if the kitchen sink intervention beats out another intervention.  Your comment that these types of evaluations are non-additive is spot on.  However, I would contend that it’s less a problem with RCTs as a tool in general and more of a criticism of how they’re used.  In my experience (often for philosophic or even moral reasons), nonprofits are reticent to offer unbundled services as part of an experiment.  RCTs can and should be designed to isolate the effects of different bundles.  This might make them less ambitious, but it would go a long way into making them more additive.  The political issue at play is funders tend not to care about mechanisms -  they want to know if a program works and how much it costs.

    Finally, from the academic perspective, it’s much more interesting to understand why an intervention works rather just calculating the magnitude of effectiveness.  Once you understand the mechanisms at play, you can then address some of your issues such as finding similar contexts where they are more likely to work.  Moreover, mechanisms reveal something about human or group behavior… and that’s really what we’re in the business of doing.

    (Full disclosure - I am an academic who is currently running an RCT with a nonprofit).

  • Maria May's avatar

    BY Maria May

    ON August 29, 2012 04:09 AM

    Dear Jacob,
    Thanks for the thoughts!  Your point to the fact that one limitation to RCTs is the way that they are applied is well taken, and it would be great to see more of the types of “bundle” comparisons that you mention.  Are there ways to increase the interest of donors and practitioners in the fundamental questions?  And, get academics to give us more practical advice while exploring them?

  • Stephen Alderman's avatar

    BY Stephen Alderman, Peter C. Alderman Foundation

    ON September 4, 2012 03:43 PM

    While we at the Peter C. Alderman Foundation (PCAF) agree with many of Ms. May’s assertions in her recent SSIR post, RCTs: Not All That Glitters is Gold, we think that RCTs can be a very useful tool.  PCAF deals with global mental health, treating war-affected populations suffering from traumatic depression and PTSD in post-conflict countries in Africa and Asia.  In our experience, RCTs have relevance not only in academic science done in the laboratory, but also in applied science at the patient’s bedside. First, RCTs yield universal truths, but must be adapted to the local context and on-the ground realities. Next, they convert outcomes to impacts. Then, RCTs provide a firm evidence base which avoids Type I errors, and points the way for further research. Take the following example. In the US, RCTs have established that Cognitive Behavioral Therapy decreases symptoms and improves social function in depressed urban adults in the controlled setting of a clinical trial. Can it produce the same effect in depressed rural adults residing in war-torn countries? The results of five RCTs in post-conflict nations are yes it can, but it must be adapted to the patients’ belief systems about depression and their definition of normal function. Thus evidence-based practices must be tempered by practice-based evidence. Depressed patients improved with therapy: a favorable outcome. But, would they have improved without treatment? The five trials provided robust evidence that recovery does not occur in these patients untreated. Hence, Cognitive Behavioral Therapy favorably impacted depressive symptoms and increased the function of adults in a variety of settings.
    RCTs are like lasers, they are very sharp but very narrow. The trick is to use them to answer narrowly- defined questions. On the other hand, ethnographic, qualitative assessments cast a wide net to identify what a culture considers normal, desirable and moral. These assessments frame the burning questions, directing us where to point our lasers and drill down to find the answers. This process of employing a combination of ethnography and RCTs known as mixed methodology is a powerful tool in global mental health.
    The last trick entails moving evidence-based treatments from the controlled world of RCT’s and ethnography to the real world where people are living their lives. This is the domain of implementation science,  a discussion which must be saved for another day.

  • Dr GM Grimshaw's avatar

    BY Dr GM Grimshaw

    ON November 10, 2014 08:27 AM

    This is discussion raises valuable points but doesn’t answer the fundamental question all research should ask: what us the nature of the bias, how well can we overcome the obstacles paint by bias. The RCT methodology is narrow but it serves to remind us of the need to deal with bias. David Sackett defined 56 different forms of bias in good scientific research, we need to be mindful of the issue.

Leave a Comment


Please enter the word you see in the image below:


SSIR reserves the right to remove comments it deems offensive or inappropriate.