Measurement & Evaluation

The Nonprofit Sector Has an RCT Problem

Randomized controlled trials are touted as the benchmark for evaluating social programs. The social sector should focus instead on an improvement orientation to evaluating performance. | Open access to this article made possible by the Crown Family School of Social Work, Policy, and Practice at the University of Chicago

By Nicole P. Marwell & Jennifer E. Mosley Fall 2025

illustration of two intersecting rulers on a gold background

(Illustration by Matt Chase)

Among many in the social sector, it’s an article of faith that randomized controlled trials (RCTs) are the gold standard for evaluating the effectiveness of social programs. There are good reasons for this belief. Only very specific kinds of evidence can demonstrate that participating in a particular social program causes whatever outcome it is designed to produce, whether it be rising income, increasing recovery from substance-use disorder, improved high school graduation rates, or something else. A well-implemented RCT can provide that evidence, and its proponents argue that it is the best way to do so.

However, is measuring specific changes—in particular, individual outcomes—the best way to tell if a nonprofit’s work is valuable? What are the consequences when this way of conceptualizing a nonprofit’s contributions is given such prominence? As we learned in researching and writing Mismeasuring Impact: How Randomized Controlled Trials Threaten the Nonprofit Sector, elevating the RCT as the singular arbiter of a nonprofit’s impact can actually have a variety of negative effects on nonprofit operations and performance. It can direct the attention of management and staff away from community-based connections, privilege larger and more traditional organizations over smaller or more innovative ones, and reduce nonprofits’ ability to respond nimbly to changing social problems. It can even mean sacrificing the most valuable evaluation-oriented mindset that a nonprofit can embrace: prioritizing continuous learning, adapting to shifting environments, and drawing on the innovative power of community collaboration.

There are, in other words, real risks for nonprofit organizations in pursuing RCT evaluation, especially since, as we’ve learned, most RCTs carried out by nonprofits don’t actually meet the method’s stringent requirements (leading to inconclusive or even misleading results). Nonprofit leaders should more carefully consider what they might be trading off for the hope of attaining “gold standard” evidence. That said, being more critical of what RCTs can do for social-impact organizations doesn’t mean giving up on evaluation. It should mean the reverse: focusing on evaluation strategies that are aimed at ongoing learning and improvement, approaches better aligned with the complex and multifaceted work the sector is doing.

Are RCTs Really the “Gold Standard” for Evaluating Nonprofits?

No one disputes how crucial it is to understand “what works.” Nonprofit organizations are the backbone of the United States’ social-welfare system, which emphasizes service-based support, such as workforce development, substance-abuse interventions, and family-strengthening services, over direct cash assistance to people living in poverty. Since these critical services receive substantial government funding, policy makers and community stakeholders naturally want to ensure that taxpayer dollars are being invested responsibly.

The RCT’s appeal as a tool for doing so lies in its simplicity: Splitting a population randomly into two groups, providing the intervention to one group while withholding it from the other, makes it possible to attribute any differences in outcomes to the intervention itself, rather than to other factors. If the first group shows improvement while the second does not, you have very strong evidence that your program caused that result. In this way, the RCT allows social-program evaluation to assess the “counterfactual,” or the outcomes that people would have achieved in the absence of program participation. This approach has spread far beyond its early use in medical research, into education, criminal justice, child welfare, and other social programs, largely through a campaign by RCT advocates that we refer to as the “Gold Standard movement.” The high value placed upon RCT evidence can be seen in high-profile public arguments that tax dollars should flow only or primarily to programs with RCT evidence of their effectiveness.¹ This shift has major implications for the many nonprofit organizations dependent on government resources.

Gold Standard movement claims about the uniquely valuable nature of RCT evidence now enjoy broad acceptance. Yet, skeptics have raised critiques.² For example, RCTs are highly context specific and may lack external validity, they focus on the average treatment effect and ignore high levels of individual variation, and real-world implementation can subvert the method’s bedrock principle of random assignment. Critics also argue that RCTs rarely illuminate clearly why a program works, making it difficult to adapt promising approaches to new settings. Finally, RCTs pose ethical questions when randomization withholds potentially beneficial treatments from people in need, underscoring persistent tensions between methodological precision and social responsibility.

Watch a conversation with the authors about their cover story!

Evaluators frequently suggest that technical refinements can help RCTs overcome these well-known limitations. For example, the problem of random assignment violations can be mitigated by strengthening adherence to research protocol, while a heightened focus on implementation challenges—such as feasibility, fidelity, and sustainability—helps address real-world factors that can derail RCT studies. There are also methodological innovations, within RCTs, that aim to pinpoint exactly why a program works. Finally, in terms of the ethical challenges posed by randomization, the Gold Standard movement argues that random assignment remains the fairest system for allocating scarce resources, suggesting that RCTs are an ethically responsible approach, rather than a barrier to assistance for those in need.

These refinements are not without merit. But as scholars who study the social production of science have noted, scientific results, tools, innovations, and processes are always social accomplishments: agreements among scientists about the norms of scientific work, what values science should aim toward, and which institutions can best move science forward. RCTs are not purely technical interventions, nor have they risen to dominance in the world of social-program evaluation simply on their scientific merits. Instead, RCTs came to be valued as an evaluation tool at least partially because they present a clear solution to difficult questions about how to allocate scarce resources for social programs.³ ;Uniquely suited to giving “thumbs-up” or “thumbs-down” answers, RCTs offer a straightforward rule: Funds should flow only to programs that have RCT evidence of success. This emergent connection between the method and the money transcends technical debates about the merits of RCTs as a scientific tool, forming part of a larger “evidence culture,” as sociologist Holger Straussheim calls it, that dominates social and political discussions of social problems.⁴ More than a technical methodology, RCTs are a social instrument that is shaping the broader direction of social policy.

How do nonprofits figure in all of this? In the United States, most social-program RCTs are carried out inside nonprofit organizations, where most service delivery happens. However, because RCTs are focused exclusively on program success, nonprofits can sometimes be treated simply as platforms for the delivery of those programs. Yet successful programs don’t happen on their own. To support effective programs—as well as the people and communities those programs serve—we also need to commit to cultivating strong, multifaceted nonprofit organizations. Improving program participants’ outcomes is an important component of nonprofit work, but nonprofits also do other things, like fostering community engagement, strengthening civil society, and contributing to shared governance. An RCT framework is not well suited to capturing these kinds of broader contributions, which are core to the missions of many nonprofits.

To understand how the technical aspirations of RCTs intersect with the complex social demands of running nonprofit organizations, we designed a study asking professionals in different positions within the nonprofit ecosystem about the growing use of RCTs for nonprofit evaluation: nonprofit managers with direct experience in RCT implementation and those who have declined RCT participation, program officers from philanthropic foundations responsible for supporting different kinds of nonprofit evaluation (including RCTs), and evaluators who design and oversee RCTs in nonprofit settings.

A comprehensive understanding of how the professionals most engaged with RCTs in the nonprofit sector perceive this evaluation method allowed us to identify five distinct problems with using RCTs to evaluate nonprofits.

Five Problems with Using RCTs in Nonprofits

From problems with the science itself to managerial issues and community-based equity concerns, these problems are often more serious than assumed. The following are five reasons why nonprofit leaders should exercise care when considering whether an RCT serves their organization’s and participants’ needs.

When a program is exceptionally effective, detecting that effectiveness is easy for an RCT. But more modest effects require much larger sample sizes to detect with confidence. Without enough participants, even effective interventions can appear to have no impact and return null findings.

1. The “False Certainty” Problem. Despite their reputation, RCTs often are not a foolproof method of evaluation. In fact, the assumption of their infallibility can produce its own problems. One of the first people we interviewed for our book is a leading advocate for and funder of RCTs. He told us something we were not expecting:

The vast majority—I would say north of 90 percent—of the [RCTs] that we see have some challenge to validity that basically makes the study not useful from a policy perspective. Underpowered, random assignment taking place at the wrong level—or random assignment taking place at the correct level, but analysis conducted at the wrong level. High sample attrition, differential sample attrition, flaws in the random assignment process that produced highly dissimilar groups at baseline, and probably fifty more common problems.

A surprising number of the evaluators we talked to shared this assessment. Can RCTs really be the “gold standard” if the method is so frequently compromised in practice? Studies showing no discernible impact one way or the other (that is, “null” effects) can create confusion for organizations, skepticism among funders, and disappointment for staff members committed to their program. And yet, as the above quote emphasizes, null findings are not always the result of program failures. Over and over again, we heard about two major issues that can prevent an RCT from detecting a program’s true impact (even when that impact exists): insufficient statistical power and the contamination of control groups.

“Insufficient statistical power” is when an RCT does not enroll enough participants for statistical analysis to be able to pick up whether the kinds of changes an intervention aims to make are actually occurring. The reality is that many—even most—social-service interventions have modest effects. For example, landing someone a job when they have been homeless and unemployed for decades is very hard. Or, for people struggling with substance-use disorder, even with the best treatment they do sometimes fall back into active addiction. When a program is exceptionally effective, detecting that effectiveness is easy for an RCT. But more modest effects—the kind most social programs tend to produce—require much larger sample sizes to detect with confidence. Without enough participants, even effective interventions can appear to have no impact and return null findings. But most nonprofits don’t have the size and reach to enroll enough participants to get the large sample size they need. What happens then? Their program may have causal impact, but the small sample size means the RCT will end in a null effect.

When the outcome of interest for an RCT is a rare event, such as gun violence, statistical power becomes even more of a problem. Gun violence is rare, even in neighborhoods with relatively high rates of violence. If you estimate that 1 out of every 200 teenagers in a particular neighborhood will be involved in a shooting at some point—a serious but statistically rare event—you’ll need to enroll more than 1,000 youth in your study before your RCT of a gun-violence-prevention program could reliably detect differences in outcomes between treatment and control groups. Not only do most programs not serve that many young people at any one time, but there may not even be that many teenagers residing in the neighborhood!

Control-group contamination is another reason an RCT might erroneously find a program to have no effect. Ideally, from a research perspective, the control group in an RCT will receive no version of the tested intervention. In practice, human beings cannot be controlled in that way. Sometimes people who are denied access to a desired program because they are randomized into the control group are able to find alternative ways to get their needs met elsewhere. For example, in a city with multiple job-training initiatives operating at once, people in one nonprofit’s control group might seek out a similar program from another organization. When control participants at the first nonprofit are exposed to help from a different organization, the measured difference between the first nonprofit’s treatment and control groups will likely shrink, masking the impact of the intervention being tested and making it even harder for the RCT to detect any meaningful effect.

Evaluators have become more assertive in trying to limit the chances of these well-known risks occurring, but more nonprofit leaders need to understand how RCTs can often promise false certainty. Given the faith placed in RCT evaluations, a result showing “no effect” could be devastating to an organization, even if that result turned out to be wrong.

2. The “Programs Need Organizations” Problem. RCTs assess programs, but programs are embedded in organizations. Virtually every nonprofit manager we interviewed found the RCT process a major challenge for their organization, even when they were happy with the results. Difficulties ranged from technical problems like handling new data-collection protocols to personnel challenges like declines in staff morale due to a sense that participation in the RCT was limiting the organization’s ability to respond effectively to clients.

One nonprofit manager told us about how his organization came to participate in an RCT because it wanted a rigorous evaluation of its long-running and anecdotally well-regarded intervention for children. “We applied for an opportunity and—I’m putting that in quotes, an ‘opportunity’—to be rigorously evaluated,” he told us. “I have post-traumatic stress from it! ... It was like, ‘Yay, we won this opportunity!’ Then it was like, ‘Oh my God, what did we get into?’”

Achieving sufficient statistical power to conduct the RCT required this organization to significantly scale up its program, and, in that process, the evaluators pushed the staff to rethink the program itself. For example, while the program had been developed and conceptualized around social-emotional learning goals, the evaluators suggested substantive changes, like lengthening the program and adding a tutoring component to increase focus on the program’s potential academic impacts. “That was not our idea,” this manager told us, and grimaced. When the organization agreed to make the changes, he continued, the process “really complicated things. And it was one of those things where you think, “Okay … that sounds great. In practice, the operationalizing and all that was a little overwhelming and difficult. … I felt a lot of pressure.”

Describing the exhaustion that set in when the nonprofit had to both rapidly expand the number of participants in its program and change existing organizational systems to keep track of them, this manager reported:

It was a stretch for us. We had a small research team who had to dedicate a ton of time and energy to the internal tracking and support needed to make data available … all the things that have to do with [the fact that] we’re doing it at such a large scale. … That’s just a huge capacity drain that we had to build very quickly.

Implementation challenges like these can significantly disrupt a nonprofit’s equilibrium. RCTs evaluate the effectiveness of programs that are embedded within nonprofits with broader missions, complex funding structures, and staff who are likely already working at capacity (or beyond). Yet participating in an RCT absorbs tremendous organizational energy, as leadership and staff coordinate with external evaluators, gather and protect data, and uphold the detailed protocols that random assignment requires. All of this can distract staff from their core responsibilities or undermine the day-to-day flexibility nonprofits need, particularly smaller organizations, which may find the burden overwhelming. Ironically, trying to demonstrate a program’s effectiveness can lead to damaging the organization on which the program depends.

For the Gold Standard movement, the promise is that generalizable insights about a social program’s effectiveness can enable funders and evaluators to advocate for replication and scaling, and thereby to reach more people and communities. But for a nonprofit faced with practical challenges to its daily operations—and whose primary allegiance is to its program participants and the communities it serves—this benefit can seem theoretical. Disregarding the strain on staff resources, addressing inquiries from community members who are excluded from programs, and coping with mission shifts often associated with RCTs can potentially tarnish an organization’s reputation and hinder its ongoing work even after the study concludes. From the perspective of nonprofit leaders and staff, such trade-offs may seem less than compelling.

illustration of a ruler on a green background

(Illustration by Matt Chase)

3. The “Communities Need Organizations” Problem. RCTs can threaten the community-level benefits that nonprofit organizations provide. After all, nonprofits rarely exist solely to run programs in isolation from the communities around them; they are often community hubs, advocates for local needs, or catalysts for grassroots initiatives. Yet these kinds of broader community-building functions can be hard to measure using an RCT, which—by design—focuses as narrowly as possible on whether a specific intervention improves a defined outcome within a set time frame. It does not measure, and is not designed to measure, the kind of community-level work of nonprofits that often unfolds in complex and iterative ways.

The foundation executives we interviewed were often keenly aware of this problem, which led many of them to approach RCTs with caution. “I would challenge anyone to go out into some of the most heavily researched areas in [our city] and find a person that lives in the community that’s talked about what any research study, any randomized controlled trial or [other] research study, has really done for their day-to-day living,” one foundation program officer said bluntly. “I don’t think you would find one if your life depended on it.”

A no-less-pointed question about the resources dedicated to evaluation came from a manager at a midsize antipoverty nonprofit. That her organization experienced some benefits from participation in an RCT didn’t reduce her discomfort with the fact that she was convinced she needed an RCT for funders to see that her organization was worth investing in. From her perspective, the time and money researchers and evaluators have spent doing RCTs and other kinds of studies have not led to real improvements in her community:

At what point do we break from this cycle and say, “All right, we know these things work. Let’s double down on these things to actually help people get out [of poverty]”? ... We’ve got to figure out that we know some things that work and then try to run with them.

Nonprofits are not only vessels that host social programs; they are dynamic, multifaceted institutions that strengthen the social fabric of the communities they serve. But an overreliance on RCT evidence may pressure nonprofits, over time, to funnel their efforts into only those activities that can be measured in this way, thereby overshadowing projects that build social capital, community cohesion, or longer-term improvements.

4. The “Rich Get Richer” Problem. RCTs tend to benefit the kinds of already well-resourced organizations whose way of working is easily adapted to RCT demands. Because setting up and successfully completing an RCT is expensive, complicated, and time-consuming, only organizations with established infrastructure, staff, and support networks are typically able to navigate these hurdles effectively. Such organizations are thus more likely to benefit from the legitimacy that comes with positive RCT evidence, further consolidating their advantage in future competitions for resources.

Indeed, a number of the people we interviewed raised questions about how the Gold Standard movement’s promotion of RCTs may presage a reshaping of the entire nonprofit sector because of this “rich get richer” problem. Should nonprofits that run the kinds of programs that lend themselves well to RCT evaluation—or that possess the organizational capacity to successfully conduct one—be held above those that do not? If legitimacy and funding benefits flow only to organizations amenable to RCT evaluation, does that devalue nonprofits whose work is not?

A program officer at a large, long-standing foundation passionately argued that organizations need to do a better job of understanding whether they’re actually prepared to run an RCT, and, indeed, whether they’re even capable of it:

Because they don’t have something that you can “randomize control!” There are some things that you just can’t: by the population they serve, by the topic they’re doing. … there’s a million reasons [why an RCT won’t work for some organizations], right? I mean, what are you guys going to offer instead that allows and equalizes that playing field so that your human service agencies don’t just get cannibalized?

A number of the people we interviewed raised questions and concerns about how the Gold Standard movement’s promotion of RCTs may presage a reshaping of the entire nonprofit sector because of this “rich get richer” problem.

This assessment of the potential of RCTs to “cannibalize” the many nonprofits that can’t engage this evaluation method recognizes that much of the broader and more community-oriented work that nonprofits do around advocacy, the cultivation of civil society, and other forms of participation in societal governance simply does not align well with RCT evaluation. If government funding is tied to results from an RCT, this form of evaluation may leave good nonprofit organizations without financial support, and many communities without the nonprofits on which they rely.

Whether because their participant base is too small to achieve adequate statistical power or because their limited funding cannot cover all the required data management, smaller or less-advantaged nonprofits may struggle to undertake RCTs. Some may serve populations with needs that defy neat measurement, or may philosophically oppose random assignment if it means denying services to eligible participants. In these scenarios, lacking RCT evidence can become a strike against these nonprofits when competing for grants, even if their work is beneficial to the participants and communities that they—and only they—can serve.

Ironically, RCTs pose their own counterfactual problem for nonprofit agility: Holding program components constant during the full length of the three-to-five-year study can actually force organizations to be less responsive to changing conditions and needs than they might otherwise have been.

5. The “Agility” Problem. Because of their length and complex structure, RCTs may actually hinder nonprofits’ agility in responding innovatively to new social problems. There are many different visions for what the nonprofit sector can or should be, but most would agree that responsiveness, innovation, and effectiveness are all key. Responsive nonprofits can meet emerging and diverse needs even during rapid socioeconomic shifts; innovative organizations use their front-row seat to changing social problems to pivot quickly and think creatively about what to do next; and effective organizations not only deliver strong programs but meet their overall organizational mission. While well-implemented, adequately powered RCTs can help us understand important things about effectiveness, they unfortunately reduce nonprofits’ ability to be responsive and innovative.

For one thing, their long time horizon—three to five years, in many cases—means that an RCT’s “final verdict” on program effectiveness often comes after the organization has already moved on to newer challenges. As one evaluator who has been in the business a very long time explained to us, organizations that strive to continually improve by responding to community needs will often find that, by the time they get data back from their RCT, the data’s value will be much reduced by changes the organization has already made to its program. This can be as true when the data are good news as when they indicate problems:

Like, you know, we just spent two years following up, writing a report, so now it’s three and a half years, and we come back and say, “You know those people you saw in 2014? They didn’t do any better than the people who went to the other agency or the other intervention.” And [the organization] is like, “I don’t care. That was 2014. I’ve moved on. I don’t do that. I still call it the same thing, it’s still the X program, but now we do this and that. And what I really want to know is whether that works.”

Ironically, RCTs pose their own counterfactual problem for nonprofit agility: Holding program components constant during the full length of the three-to-five-year study can force organizations to be less responsive to changing conditions and needs than they might otherwise have been. There is a built-in conflict between the design of an RCT—which requires everything to be held constant for the research—and the motivations of nonprofit staff, who want to take action when they see ways in which they can improve service delivery.

An evaluator from a different firm stated the point plainly:

If the [program] model is highly prescribed and the research is dependent on this highly prescribed model, it can limit program improvement. So we have had many meetings on our project where the program staff and the clinical director are like “We need to change how we [do something].” Then we have to spend a lot of time figuring out if that changes what we’re studying. … Sometimes bigger changes would be really helpful [for improving the program], but it can hurt the research.

Finally, while most agree that evaluation needs to be done more often than a single point in time, RCTs are too complex and expensive to be done continually (or even more than once). When a nonprofit puts all its energy and funds into a high-stakes RCT evaluation, it gives up the kind of ongoing learning that could help it continually improve its ability to serve its participants and community.

Three Principles for Nonprofit Improvement

To better meet the challenge of creating greater nonprofit impact, we need to think about evaluation in more expansive and flexible ways. We should reject the idea that the causal effect of a specific program is the only—or even the best—way to understand an organization’s impact. Similarly, pursuit of program effectiveness should not come at the expense of responsiveness, innovation, and community connections. The United States has chosen nonprofits to deliver most of its government-funded social programs because nonprofits are believed to be able to move faster, work in concert with community needs, and be more adaptable than government agencies. When RCTs don’t support those goals, and even hinder them, what should we do instead?

illustration of a ruler on a blue background

(Illustration by Matt Chase)

1. Tailor Evaluation to Specific Organizational Strategies and Community Needs. When experts say nonprofits should go easy on RCTs, they’re not being controversial: Even the biggest RCT fans agree that many organizations don’t have the capacity to successfully field an RCT and may never have it. You need sophisticated internal systems, a particular kind of program, and a lot of resources. So, where does that leave the many nonprofits that don’t?

As Lehn Benjamin and David Campbell have argued,⁵ nonprofit evaluation practice needs to take the “programs need organizations” problem seriously, recognizing how nonprofits deliver on their missions in ways that don’t show up in measured program outcomes. Nonprofits also need to build evidence that is the “right fit” for where the organization is, as Mary Kay Gugerty and Dean Karlan have put it. ⁶ Gugerty and Karlan argue that while RCT evidence is indeed the gold standard for demonstrating the causal impact of programs, most nonprofits would be better served by working to carefully shape and follow their theory of change, as well as to build up their management and data capacities.

Creating a theory of change as a road map for how an organization believes it will achieve its goals—unlike a bare-bones logic model—spells out the assumptions surrounding why a particular approach might succeed. Identifying smaller stepping-stones on the road to the ultimate goal also helps nonprofits track and improve anything that’s really critical—such as building trust with participants or standardizing a workflow—rather than focusing exclusively on one specific outcome that might be amenable to RCT evaluation.

In addition, nonprofit evaluation experts like Gugerty and Karlan, as well as Alnoor Ebrahim,⁷ suggest that basic information tracking—such as evaluating whether basic program elements are working smoothly—will offer nonprofits more important program insights than a big, expensive RCT study likely ever could. Your local soup kitchen, for example, might make bigger strides toward improvement by concentrating on collecting participant feedback—for example, on the content and timing of its meal service—and using that to refine day-to-day operations.

Nonprofits must recognize that big-picture social change is about more than a single program, involving everything an organization does and how it collaborates with the broader community. The best type of evidence for measuring organizational success could therefore be different for a group dispensing food aid than for one trying to make complex mental health systems more efficient. The important thing is to match evaluation methods to the organization’s objectives and the environment in which it operates. It is often the case that improved data collection or simpler performance tracking might carry bigger benefits, at lower cost, than an RCT ever would.

When communities speak for themselves about what they want, it helps balance a system that often puts raw data or external expert views above local knowledge. By doing evaluation “with” participants instead of “on” them, nonprofits scale up both trust and equity in meaningful ways.

2. Centering Participant Perspectives Is Key to Nonprofit Performance. Most nonprofits report mainly to funders and regulators, which makes sense; they need funding to survive. But if an organization wants to design services that truly meet the needs of the people it’s serving, it has to stay dialed in to their perspective. Traditional RCTs can, however unintentionally, orient nonprofits away from these participant insights, pushing staff to pay more attention to “technical” matters—like fidelity to a program manual—than to the concerns of people being served by the program.

Plenty of research tells us that strong bonds between staff and participants often produce better outcomes than any specific intervention technique.⁸ Relationship building is vital in motivating participants to really engage with a program, as well as to understand what kinds of adjustments would better meet that participant’s needs and what kinds of outside resources could help bolster their progress. The closer participants feel to nonprofit staff—and the more they see the program as relevant to their lives—the more likely they are to stick with it and reap the benefits.

To leverage the documented importance of these relationships, more nonprofits and funders are looking at ways to bring participants center stage in evaluation.⁹ Feedback loops, surveys, interviews, or codesign workshops can illuminate user insights about how to make programs work better. Human-centered design, for example, is a popular framework that emphasizes treating nonprofit participants as active partners in program development and evaluation. But it embraces multiple, iterative steps of refinement: trying out a service, gathering feedback, making improvements, and then checking in again. This approach contrasts starkly with the RCT’s arm’s-length isolation of the causal effect of a specific program model while holding everything else still.

Finally, when communities speak for themselves about what they want and how they define “success,” it helps balance a system that often puts raw data or external expert views above local knowledge. By doing evaluation “with” participants instead of “on” them, nonprofits scale up both trust and equity in meaningful ways.

3. Focus on Improvement Through Iteratively Addressing Problems (Instead of Standardizing Solutions). Nonprofits operate in dynamic environments as community needs evolve, economic realities shift, and new political pressures arise. Holding still for a long-term evaluation with a big final reveal can actually stop nonprofits from making the kinds of changes that could meaningfully improve participant experience and outcomes. Instead, nonprofits should create a culture of continuous improvement that uses data in real time, closely tracking whether something is working here and now, not just waiting for a thumbs-up at the end of a yearslong evaluation process. If signs suggest an idea isn’t hitting the mark, a nonprofit can quickly regroup and try something else.

Such rapid iteration mirrors the approach of tech startups that launch products in beta, gather user feedback, and refine swiftly through faster, smaller tests. Some nonprofits, for example, are already experimenting with “rapid cycle RCTs” (mini-experiments on short timelines), A/B testing (like comparing two versions of a web form to see which resonates more with users), or plan-do-study-act cycles that break changes into bite-size chunks to test and measure. These iterative methods let organizations adjust and improve quickly, but without jeopardizing their entire budget or staff morale.

Many of these approaches are still evidence based, but they don’t require nonprofits to freeze the program design until a long, high-stakes RCT study concludes. When organizations feel safe to experiment, fail, and try again, they’re more likely to discover paths that meaningfully serve real people. They become nimble learning organizations rather than being locked down by their evaluation methodology.

An Improvement Orientation for Nonprofits

While nonprofits can certainly adopt an agile, participant-focused style of evaluation on their own, collaboration with funders is often essential. When grant and contract requirements are rigidly reliant only on high-stakes RCT findings, the risk of stifling nonprofit creativity and adaptation is high. Fortunately, many private foundations, along with some public agencies, have begun voicing doubts about funneling all their evaluation resources into high-stakes RCTs. They see that smaller, more iterative methods can still be high-quality, especially if they’re aligned with the nonprofit’s mission and goals.

This shift, however, is far from complete. Nonprofits still face a world of well-intentioned experts who emphasize the “gold standard” of evidence. But it’s important to remember that making a difference is not just about causal effects documented in a polished report. It’s also about how swiftly a nonprofit can adapt to local realities, how much ownership community members have in the process, and how prepared the staff are to keep learning.

In an improvement orientation, data are not inert final verdicts but constant conversation starters. And, yes, nonprofits may still invite outside researchers to run rigorous experiments in certain circumstances—especially in well-established programs aiming to expand significantly. But RCTs shouldn’t be the default for every question. Nonprofits are in the business of solving tough social problems, and true innovation requires staying nimble, seeking fresh insights, and being willing to shift direction—even in midstream—based on emergent information. RCTs can illuminate whether a program strategy has definite impact, and this can be useful. But in many situations, such end-of-the-line proof will be far less valuable to nonprofits than a constant loop of learning.

Our message is straightforward. Rigorously evaluate what works, but in a way that truly supports nonprofits and their communities. Tailored strategies, active participant involvement, and iterative problem-solving can transform organizations into ongoing stewards of public benefit.

RCTs may have their place, but nonprofits must have the breathing room—and the right tools—to address complex challenges as they actually unfold. In the end, that blend of effectiveness, adaptability, and genuine community engagement is more likely to produce lasting progress for everyone involved.

Read more stories by Nicole P. Marwell & Jennifer E. Mosley.

(Illustration by Matt Chase)

Create a free SSIR account to access this content.

Our mission is to share the best in social innovation research and practice, regardless of your ability to pay, so free account holders get additional access to articles on ssir.org each month.

If you can afford it, please support SSIR with a subscription. For as little as $6 per month, you'll get:

Unlimited access to new articles
Access to SSIR's 20+ years of field-defining work
Digital PDF editions of every print issue

And you'll make sure SSIR continues to support social innovators around the world with new ideas, insights, and inspiration.

Are RCTs Really the “Gold Standard” for Evaluating Nonprofits?

Five Problems with Using RCTs in Nonprofits

Three Principles for Nonprofit Improvement

An Improvement Orientation for Nonprofits

Create a free SSIR account to access this content.

This article is free.