Using AI to Salvage Lessons from USAID’s Archive

For people who move money in the name of impact, the shutdown of USAID landed like a star collapsing into itself. The United States government’s flagship development agency—once the world’s biggest patron of good intentions—closed its doors in July 2025, pulling billions of dollars of support for lifesaving humanitarian efforts around the globe into the void. What disappeared was also the world’s largest learning experiment: about $30 billion spent not on projects themselves, but on understanding them—six decades of field-tested trial and error across health, education, farming, governance, and humanitarian response. Those evaluations lived in a public database, a kind of collective brain for the aid world. Now that brain is gone or hiding on some forgotten server.

History will judge not only what we tried but whether we learned anything from it, and this time, we can’t plead ignorance. Artificial intelligence has given us tools that expand the limits of human learning; there is no longer an excuse for not knowing what we already know.

Before the lights went out, my social enterprise, DevelopMetrics, turned those tools loose on the USAID archive—one last look at what half a century of development really taught us. If you allocate grants, run programs, or shape policy, this is the closest thing we have to a postmortem on how tens of billions of dollars in development aid actually behaved over the course of decades in the wild. It offers a model for future learning on a mass scale, and the results affirm some important guiding principles as the development ecosystem considers how to build going forward.

How the World's Largest Learning System Worked—and Why It Struggled

For 60 years, USAID’s memory lived in PDF files. The agency commissioned over 100,000 evaluations of their projects, stacking up tens of millions of pages in the Development Experience Clearinghouse (DEC)—USAID’s repository of evaluations. Learning was serious business: missions convened after-action reviews, and “Collaborating, Learning, and Adapting” became doctrine. But the system was human-bounded by three limits that defined it:

Cognitive load. No team could read enough to connect the dots across continents before the next funding cycle.
Timing. Most evaluations happened at the end of projects—too short a window to capture whether results endured or if systems truly changed.
Incentives. Staff rotated, political winds shifted, and hard-won insights often expired before they reached a new project design.

Learning happened, just unevenly. The machinery was vast, but the data eventually outweighed the people meant to absorb it.

What Happens When You Can Finally Read Everything

For more than a decade at USAID, I lived inside these limits. As a country economist in Bangladesh, then across 11 Caribbean countries, and later in Washington as the senior economist for Asia, I helped allocate hundreds of millions of dollars. Despite the agency’s vast experience, I often found myself designing programs based on a skimmed World Bank report and whatever coffee I’d managed to drink that morning.

My moment of reckoning came when $100 million that had been budgeted for Pakistan was suddenly pulled back and reassigned to regional programming. Suddenly the task was to design a regional trade program on a deadline that barely allowed for a coherent thought. I went searching for every trade intervention USAID had ever run, only to discover again what insiders already know: The lessons were there, they just were not accessible at the speed decisions were being made.

In 2021, I left USAID to pursue a PhD at the United Nations University–Maastricht, focused on a question I could no longer ignore: Could decades of qualitative evidence be made analyzable to give organizations a memory equal to their ambitions? That PhD work grew into DevelopMetrics, which I founded to move the research from theory into practice. DELLM (the Development Evidence Large Learning Model), a domain-specific AI system trained to read, classify, and connect development evidence at a scale no human team could ever match, emerged from that effort.

USAID became one of our first clients. They contracted DevelopMetrics to run DELLM across their evaluations, and the results spread quickly: DELLM surfaced 10 times more relevant evidence than human analysts, cut drafting timelines from months to hours, and saved millions in contractor spending. Embassies used it to design strategies, produce congressional reports, and retrieve lessons that had been invisible in the noise of the archive. By 2025, the agency was sponsoring DELLM’s FedRAMP authorization—a security step reserved only for tools considered operationally essential.

When USAID’s closure was announced, running DELLM across the full archive became urgent—not as a technical exercise, but as an act of preservation.

How the Model Actually Read Everything

For this analysis, we fine-tuned DELLM on the full corpus of USAID evaluations in the DEC. A team of experts spent years manually coding thousands of excerpts—tagging interventions, outcomes, and lessons learned—to build a precise taxonomy of how development work is described in practice. That human-coded dataset became the scaffolding DELLM learned from.

Running the model meant building a full pipeline. Each USAID report was broken into small, coherent text chunks. For each chunk, DELLM was asked a set of fixed, standardized questions:

Does this passage contain an intervention? If so, which type?
Is there a lesson learned, outcome, or implementation detail?
How does this excerpt relate to recurring patterns across the corpus?

For each question, the model returned probability scores for all intervention categories and lesson types. Chunks above a confidence threshold were extracted and linked back to their source documents. The raw output looks less like prose and more like a spreadsheet for the world’s largest field experiment, thousands of rows where each entry contains:

the original text excerpt,
the intervention type,
associated outcomes or lessons,
metadata (sector, country, year),
and a confidence score.

Once that dataset was assembled, we ran a second set of prompts to cluster the most recurrent lessons across decades and sectors. The result was a ranked empirical map of what USAID had been learning repeatedly for 60 years. The full technical approach, including how the model was trained, coded, and validated, is detailed in a peer-reviewed methods paper published in Land Use Policy.

DELLM isn’t a crystal ball; it’s a tireless, methodical reader. It can distinguish “training of trainers” from “on-the-job coaching,” or a steering committee from a service-level agreement, because it has been taught, by humans, to see the contours of how development actually works. When the model processed the archive end-to-end, what emerged were patterns that kept resurfacing across countries, decades, and sectors—old truths rediscovered but rarely retained. Because every model prediction was anchored in thousands of human-coded examples and then validated against held-out samples, the clusters reflect patterns the expert team would have found if they were given infinite time.

What follows are the five most frequent lessons that emerged from that process—the development practices that consistently stuck, and the blind spots that repeatedly unraveled progress. Our expert coders tagged hundreds of narrow, specific lessons; DELLM then clustered those into broader patterns that recurred across decades and sectors. Each lesson here is therefore a synthesis of many individual findings. The specific project examples that appear under each lesson surfaced the same way: DELLM repeatedly pulled them into the relevant clusters as representative cases, not because they were “top performers,” but because they consistently embodied the pattern underneath the lesson.

1. Bring Delivery Closer to Households

The difference between a policy that works and one that doesn’t often comes down to geography. Programs perform best when decisions, follow-ups, and problem-solving happen where people actually live—the farm, the school, the clinic—and not in air-conditioned meeting rooms a hundred miles away.

In Rwanda, the Enhancing Participatory Governance and Accountability project (2017-2020), set up village agriculture committees which became regular spaces where farmers could question local officials about issues like fertilizer quantities, seed delivery, and budget priorities. As a result, local government staff reported that they shifted from “top-down planning” to engaging directly with communities about what they needed most—such as a maize dryer—and then incorporating those priorities into district agricultural budgets. The project blurred the line between policy and practice, creating a feedback loop in which farmers articulated concrete needs and authorities adjusted plans in response.

Projects that kept delivery distant—designed around facilities instead of people—found their systems underused; those that invested in local touchpoints and predictable contact turned access into actual use.

2. Practice Changes Practice

For decades, “capacity building” meant a roomful of participants, a PowerPoint deck, and a certificate at the end. Yet again and again, USAID’s evaluations showed that learning evaporated as soon as people went home. Behavior changes only when skills are practiced in the real world, reinforced by peers, and observed by someone who can say, “Try it again.”

In Guinea’s Siguiri Agricultural Development Activity (2020-2023), the big workshop was just the opening act. The real work happened later, when field advisers were coached one-on-one in the heat and dust of their clients’ farms. They were tested, given action plans, and watched until new routines stuck. In Uganda’s MotherCare project (1991-1993), midwives “graduated” only after performing each emergency procedure correctly multiple times under supervision, a kind of clinical apprenticeship that replaced lectures with muscle memory.

Projects that stopped at the workshop stage often watched new skills evaporate; those that built in coached repetition and peer learning saw behavior actually change.

3. Design for Scale, Not for Pilots

The development world loves pilots. They’re neat, fundable, and small enough to photograph. The problem is that many pilots can’t survive outside the lab conditions that created them—when the extra budget disappears, so does the success. Lasting programs design for scale from day one. They name owners, secure budgets, and test the routines that will keep them running.

In Angola’s ProAgro Program (2004-2007), the team refused to play the pilot game. Instead of funding a few showcase centers, they asked local cooperatives to co-finance their own service hubs. The project’s contribution was capped; the co-ops trained managers in bookkeeping and negotiated supplier contracts that could keep the centers alive after donor money ended. By the close, multiple hubs were self-financing, unions were marketing produce, and the system no longer needed an external spark to keep it going.

Many pilots failed not because they were bad ideas, but because no one owned them. ProAgro turned ownership into design, proving that scale isn’t a phase, it’s a mindset.

4. Cocreation Beats Consultation

Consultation is polite. Cocreation is binding. Projects last when the people who must run them share real power—defining the problem, the rules, and the responsibilities from the start.

The energy-governance project Across Central America (2004–2020) was built around community-defined management systems for solar and water-pumping infrastructure—with residents choosing committees, setting tariffs, and training local technicians. In Madagascar’s ASOTRY program (2014–2025), villages selected their own activities and monitored their own progress, turning decision-making into a shared practice rather than a consultation ritual. In both cases, systems endured because the people responsible for running them helped write the rules that governed them.

Projects that limited participation to consultation created committees that vanished with their budgets. Those that shared authority in writing left behind institutions that kept running.

5. Strengthen the Middle Layer

Big strategies make headlines; quiet supervisors make systems work. The “middle layer” of teachers, nurses, agronomists, cooperative leaders, and others responsible for daily implementation is where policy meets the public. Ignore it, and reforms collapse under their own weight.

Vietnam’s Clean Air Green Cities Project (2017-2020) proved the point outside the usual health or education mold. The project equipped schools and youth groups with 130 low-cost air-quality sensors, then trained teachers and student mentors to interpret the data and run awareness drives. Within months, those mid-tier actors had organized “clean air days,” convinced households to ditch coal stoves, and turned abstract data into 17,000 small actions. The ministry didn’t need to manage it; the middle layer did.

When the middle layer is supported, systems hold; when it isn’t, everything else is just theory.

What Organizations Should Do Next

Every large organization believes it learns. It holds workshops, writes reports, circulates memos, and then moves on. The problem isn’t the absence of learning, it’s the half-life of memory. Insights fade as people rotate out, incentives shift, and politics change. The archive fills up, but the organization forgets.

That’s why the aim of using AI in this case isn’t about replacing people with machines; it’s about giving organizations a second brain that doesn’t forget so easily. AI models can’t feel urgency or pride or fear of budget cuts, but they can hold knowledge steady long enough for humans to face it. They can flag what’s been tried before, what worked, what didn’t, and what’s already sitting in the files.

Some governments have appointed AI “ministers” to clean up procurement; the development world could use an AI Learning Officer, a role charged with making sure evidence shows up at the table when it matters. Not as a consultant or a data dashboard, but as a standing function with the power to ask one question no one likes to hear: “What did we learn last time?”

The evidence from six decades of USAID’s work isn’t asking for novelty. It’s asking for discipline—the courage to build, repeat, and refine what we already know works. With USAID’s closure, the archive may be gone, but the responsibility to remember—and to learn—now belongs to everyone else.

Read more stories by Lindsey Moore.

Economic Development

When USAID Shut Down, Its Lessons Nearly Vanished. AI Helped Recover Them