Sunset at prehistoric monument of Stonehenge in England Good things are built to last. (Photo by iStock/nicolamargaret)

As I argued in my 2022 article, “Beyond X Number Served,” nonprofits and donors should expand our thinking beyond the number of beneficiaries we reach. We should hold ourselves accountable for two other key aspects of impact that can be hidden behind numbers or get lost in statistics in yearly reports or grant applications: How deeply and well are people served? And how long does the impact actually last?

While I still believe real impact requires delivering breadth, depth, and durability simultaneously, there are particular challenges to measuring impact across time that make it worth thinking harder about. Time and again, I’ve heard people say that durability is “too hard to measure” or insist that “we are not responsible for impact years beyond initial program delivery” and “no one is asking us to measure it!” And yet, durability is where the rubber meets the road. In health care, expanded treatment doesn’t mean much if it doesn’t lead to measurable and sustained increases in health outcomes. If we address hunger by providing meals, what we really want is to progress toward stable long-term food security for disadvantaged communities. And so on.

The time has come for nonprofit leaders and donors to launch deep and sustained efforts to tackle durability: the least discussed, least measured, but arguably most important of these three key metrics.

Where is the Data?

Let’s zero in on the field I know best, employment. At Generation, the global employment nonprofit network that I lead, our goal is to train and place adults of all ages in new careers. But if our employed graduates fail to earn a living wage, or fall out of work a year later, what have we really achieved?

Are you enjoying this article? Read more like this, plus SSIR's full archive of content, when you subscribe.

As a workforce field, we have collected shockingly little evidence that the hundreds of billions of dollars spent annually by governments, foundations, corporations, and individual learners on training and reskilling actually results in lasting improvements in income and well-being. What analysis we have gathered shows at best a low to mixed return. A 2017 analysis of 12 technical and vocational education and training (TVET) programs across eight countries examined employment impact 12 to 18 months post-program and found that these efforts, on average, increased employment by only two percentage points. Similarly, Mathematica conducted a 2023 review of 17 impact evaluations of TVET programs across low- and middle-income countries and concluded that only four demonstrated a statistically significant impact on employment beyond 12 months. What about lifting incomes? JPAL’s 2023 review of 28 randomized evaluations of TVET programs, which also agrees that most programs increase employment only modestly, found that just half increased graduates’ earnings at some point in time. But a big part of the problem is that we just don’t know how durable our intervention has been; as the study concluded, “To date, there is not a very clear understanding of what influences whether an intervention works in the short run in comparison to the long run.”

Data Forward

Generating reliable data on durability requires collecting the ongoing results of individual outcomes, long after a specific intervention has ended. This is hard: It means staying in close touch with program graduates to keep pace with inevitable changes in phone numbers and emails and maintaining deep enough bonds that people will be motivated to report back on their status year after year. But doing this requires more than determination. It requires creativity. At Generation, for example, while data completion rates for our alumni surveys start out at 90 to 100 percent within the first year following program completion, they fall to around 60 percent at one to two years post-program and then settle to around 30 percent after two to five years. Maintaining even that comparably high level of long-term response rate across our now 100,000+ global alumni is hugely valuable, but it’s not good enough. For example, there is a problem of positivity bias: Since it’s likely that employed alumni will be more inclined to respond than those who are unemployed, we need much higher response rates to reliably speak to the trajectories of diverse learner profiles across the geographies we serve.

To improve data completion at Generation, we are now moving towards multi-channel follow up, which relies on a combination of emails, online surveys, SMS/WhatsApp messages, direct one-to-one follow-ups via text or phone call, and in-person meetings at alumni events. We are also exploring using interactive voice response for short surveys about job retention or wages.

This kind of extensive data collection does not have to be expensive. It currently costs Generation one percent of the total cost per learner to measure our durability outcomes. However, few philanthropic or government donors incentivize durability data measurement, much less pay for it. Workforce funders, particularly governments, are emblematic in the persistent focus on expanding the number of people served, which means that funding rarely goes beyond covering program delivery costs, with little to no requirement or support for reporting on program outcomes beyond the grant period.

And yet, wouldn’t investing in better long-term impact measurement also sharpen our ability to make operational improvements? It certainly did at Generation, when we began tracking an “impact ratio,” or the extent to which Generation is filling a percent of annual vacancies for a target profession in a target city (for example, junior full stack developers in Guadalajara). In 18 locations across eight countries, we now hold more than 5 percent of entry-level jobs, which is a significant share of hiring, up from nine locations in four countries one year ago. Our new approach enabled us to identify which professions had the greatest potential for growth, and to build an employer ecosystem to achieve it. What got measured indeed got managed.

More is required than data, however. To tighten the links between better measurement of durability and better management by service providers, we need common datasets. In the employment and training field, for example, living wage attainment should be the gold standard, the most objective measure that a provider can use to assess the economic mobility of graduates over time. But except for the US, UK, Canada, and Australia, robust living wage benchmarks are not available publicly (or regularly) for most countries, let alone for a variety of household types and locations. As a result, Generation has had to develop our own living wage benchmarks for the countries in which we operate, sending local colleagues to gather the prices of goods like food, housing, and utilities and then combine this data with publicly available sources. If a freely available and robust source for global living wages existed, it would be a game-changer for all organizations operating in our field to understand how their graduates fare in comparison.

The Path Forward

How might nonprofits and sectoral stakeholders gather more and better durability data? It starts with being willing to have the hard conversations and agreeing upon a universal standard that we collectively believe will better inform our programmatic decision-making. Because we insisted on data-gathering from the outset, Generation currently holds 40 million data points that track the learner lifecycle, from application to five years post-graduation. But we can’t do it alone. We would welcome a debate within our field about what combination of metrics that track employment status, job quality, wages, career growth, savings, living wage trajectory, and personal well-being would constitute the most important benchmarks for measuring the long-term impact of our collective efforts.

Governments and philanthropies can accelerate this journey by making durability a priority. Of course, grantees can and should have a wide array of delivery models and theories of change. But tracking impact against a universal durability data standard would not only be illuminating for funders, it would also generate insights for practitioners.

Success won’t come quickly and will doubtless require numerous experiments in every sector to assess what’s both doable and valuable. The charter school movement in the United States shows that a data trajectory is possible. While data gathering initially focused on enrollment and performance (relative to public school district peers), it has, over time, expanded to high school graduation rates, college acceptance rates, and college graduation. Now some segments of the field are even pushing data-gathering to include income earned in the first job post-graduation.

Whatever sector we work in, we are all pushing for change that improves individual well-being and addresses massive inequities—and we want that change to stick. We want durability. And the only sure way to know whether our program impact matches our aspirations to roll up our sleeves and, with humility and patience, commit to following the durability path wherever it may lead.

Support SSIR’s coverage of cross-sector solutions to global challenges. 
Help us further the reach of innovative ideas. Donate today.

Read more stories by Mona Mourshed.