If you follow the nonprofit press closely, I imagine that over the past year you’ve been treated to a healthy dose of the “scaling what works” rhetoric. Perhaps it’s gone something like this: “We know what works, and if we could just steer funds to those organizations with superior results, society would be much better off.” The message seems to be that our heads are in the sand, and if we could pull out long enough to see the light, all our problems would be solved.

To be clear, nonprofits that invest in measuring, improving, and ultimately proving their impact should be rewarded, and they should grow. But in order for “scaling what works” to actually work, we need a new and improved version that addresses two fundamental constraints:

“Past performance is no guarantee of future success”
The SEC mandates that mutual fund companies affix this label to their products; nonprofits ought to do the same, for two reasons. First, proving that a model works at one site, in one context, and at one point in time does not guarantee that it will work again. Consider that one recent, high profile effort to replicate the positive results of 67 high-quality pharmaceutical studies achieved full success only 20 percent of the time. With the many-fold higher complexity of replicating social interventions in real-life settings, it is hard to imagine a much better success rate in our line of work.

Second, even if nonprofits demonstrate success in multiple sites or settings, few are truly clear about whether their models are replicable. Most evaluation studies devote little, if any, attention to underlying organizational factors (such as culture and leader characteristics) and contextual factors (such as regulatory climate and the presence of high-capacity partners) that play a role in the model’s success. In the absence of understanding the conditions under which a model worked, organizations or funders often require replicators to follow the original model with full fidelity, potentially precluding important adaptations and improvements that could increase the odds of success.

What are we to do? Evaluators and evaluated organizations should devote much more effort to studying and reporting on these underlying factors. As a field, this means recognizing and accepting the value of qualitative assessments in teasing out the essence of a model’s success. It also means tackling, head-on, complex questions around replication, including what characteristics best position outside organizations to successfully replicate a model, what adaptations these organizations should be able to make, and whether there is merit in replicating practices to existing providers, instead of supplanting them.

We may learn, for example, that a youth mentoring organization in Alabama is succeeding primarily because its well-connected and charismatic executive director has secured high-quality corporate mentors, state government grants that enable more programming, and a devoted staff that works double time for the program’s youth. The best path to scale may therefore involve internal replication to new counties in Alabama where the executive director’s relationships are strong, or external replication to organizations in other states with similarly well-connected and charismatic leaders. (For more on replication, see Bridgespan’s “Getting Replication Right”).

Missing the forest for the trees
The “what works” in “scaling what works” is increasingly defined as interventions that achieve statistical proof of their impact, often through well-designed experimental evaluations known as randomized control trials (RCTs). While RCTs are practical in many contexts, requiring them to earn a “what works” imprimatur leaves out many interventions, such as advocacy and neighborhood revitalization. And even in fields like human services, the primacy of RCT evaluation favors organizations that seek to move short-term indicators (such as improved attendance or getting a job) and penalizes those that aim for longer-term change with clients exposed to highly dynamic environments.

A better approach recognizes and accepts the value of a wider range of evaluative methods. If a skilled and independent evaluator concludes, after deep quantitative and qualitative research, that a certain approach to neighborhood revitalization is having an impact and is replicable, funders should be much more eager than they are today to scale this kind of “what works.” Why? Because complex interventions that achieve longer-term, root-cause change will ultimately have more impact in our society than more straightforward interventions that achieve shorter-term, fragmented change.

It will take a full-team effort to ensure that “scaling what works” actually works. Funders must insist on replicability assessments within the evaluations they fund and consider a wider range of ways to “prove” impact. Organizations with evidence-based models, and those looking to adopt them, must learn to welcome a thorough assessment of those underlying factors that greatly affect the success of replication.

What is your experience in “scaling what works”? Which funders and organizations are taking a pragmatic approach and achieving great results?

And since this is a blog post about improving performance, I hope you’ll consider helping me improve mine: what are the topics within performance measurement that you would be most interested in reading about over the next six months? Please leave a comment!

Read more stories by Matthew Forti.

Tracker Pixel for Entry