Data Is the Key to Building AI for Social Good

computer-generated three-dimensional pattern of blocks

Recent milestones in generative AI have sent nonprofits, social enterprises, and funders alike scrambling to understand how these innovations can be harnessed for global good. Along with this enthusiasm, there is also warranted concern that AI will greatly increase the digital divide and fail to improve the lives of 90 percent of the people on our planet. The current focus on funding AI intelligently and strategically in the social sector is critical, and it will help ensure that money has the largest impact.

So how can the social sector meet the current moment?

AI is already good at a lot of things. Plenty of social impact organizations are using AI right now, with positive results. Great resources exist for developing a useful understanding of the current landscape and how existing AI tech can serve your mission, including this report from Stanford HAI and Project Evident and this AI Treasure Map for Nonprofits from Tech Matters.

While some tech-for-good companies are creating AI and thriving—Digital Green, Khan Academy, and Jacaranda Health, among many—most social sector companies are not ready to build AI solutions. But even organizations that don’t have AI on their radar need to be thinking about how to address one of the biggest challenges to harnessing AI to solve social sector problems: insufficient data.

Garbage In, Garbage Out

Data is the fuel that drives AI. AI is a machine, and the machine is only as effective as the quality of the data that is fed into it. The biggest barrier to developing robust AI to solve any problem is having quality training datasets. A quality training dataset must be:

Big enough, so that it represents all of the scenarios to be modeled;
Up-to-date enough, meaning there is a way to actually update the training dataset when the conditions being modeled change;
Accurate enough, meaning that each piece of data in the training dataset is accurate and reflects the context and situation at hand, in order to train the algorithm appropriately.

If these principles (adapted from Kevin Starr’s “Big Enough, Simple Enough, Cheap Enough” investment principles) aren’t met, AI solutions are unlikely to result in outputs that are reliable for action.

The fact is, right now, the robust, recent, and accurate data required to enable AI to address many pressing problems in the social sector either does not exist or it is distributed across many silos. Therefore, the social sector severely lacks training datasets that are big enough, updated enough, and accurate enough.

In order to realize the benefits of AI for all, social sector entities—nonprofit organizations, social enterprises, funders, and impacted communities—must commit to systematic, ethical, and sustained data-gathering as a fundamental component of our work, along with investing in shared data infrastructure in order to aggregate the data needed to achieve an accurate shared picture of the circumstances we seek to improve.

One way to think about this is that in order to build cars we first need to invest in the roads.

Here’s how we get there.

Fund the Groundwork: Gather—and Pay For—Better Data

Every time we reach a new paradigm in technology that we hope will improve lives equitably, we encounter the same problem: a lack of required infrastructure in low-income countries. Today, a major component of that infrastructure asymmetry is simply a lack of the volume and types of data required to make AI work.

If you’re ambitious about social change, you have to collect data, and you almost certainly need to collect it better. Ethically gathering reliable, consistent, and accurate data costs money, but it’s the only way to achieve an ecosystem in which most of the world can reap the benefits of emerging technology.

We’ve seen rapid advances in AI development in the private sector, which are enabled by available data at scale. In contrast, most social sector enterprises struggle from a lack of adequate data to reflect our work, our successes, and the magnitude of the problems we’re working to solve. Even when we gather data for program monitoring and evaluation, we often do so in one-off, unsustainable ways, to fulfill grant reporting or other compliance obligations. The challenges of ethical data collection—in terms of consent, ownership, and safety for vulnerable communities—have made data collection in the social sector even harder to scale.

When Nexleaf started in 2009, its co-founders aspired to bring the value of consistent sensor data to low-resource health systems. As technologists who understood the power of data to drive human action and anticipated the future of machine learning models, Nexleaf recognized robust data as a crucial building block of the infrastructure required to innovate. As a social enterprise, Nexleaf developed a sensor and data platform that ministries of health could leverage to access a real-time map of their vaccine supply chains, target investments, and rapidly repair failing vaccine refrigerators and transport trucks.

Gathering data can no longer be seen as checking a box to complete a project over the short term; data should instead be viewed as an essential component of any intervention, and indeed as a line item in the budget. Data that can be collected responsibly through ongoing longitudinal program operations (as opposed to project-specific monitoring and evaluation exercises) can also create future opportunities to drive greater impact. In order to build a complex machine, we first need to invest in the nuts and bolts.

Share That Data: Interoperable Datasets Spark Innovation

Collecting and managing the large datasets underpinning AI is expensive. That’s why the social sector must contribute to and finance shared data infrastructure that maintains sufficient data resources to maximize accuracy and usefulness.

This is one area in which the social sector has the opportunity to approach our AI revolution in a radically different way from the for-profit world. Even those in competition with one another—for example, different purveyors of agricultural planning tools in low-income countries—can achieve co-benefits and amplify their impact by contributing to a shared data infrastructure.

Right now, even the data that does exist is too siloed. Take the example of vaccination records. Some countries have their own national digital health systems, while others rely upon one or more app vendors to keep these records, and all this data is stored separately. In the event of a cross-border measles outbreak, how useful could an algorithm that recommends locations for emergency vaccine campaigns be?

Conversely, crisis helplines are leading the way on interoperability. AI applications have high potential to help counselors working in emergency mental health do more with less and help make up for the immense shortage of personnel trained to address mental health crises. Several entities are sharing technology—and often anonymized data—across multiple organizations in similar fields. With funding and pro-bono support from Google.org, The Trevor Project pioneered the use of a simulator for volunteers training to become counselors for their LGBTQ+ helpline. This increased the amount of role-playing available to trainees (chatting with an AI-powered persona) while reducing time and stress on the human trainers. This led to a spinoff called ReflexAI, which is making this capability available to many more helplines, including those supporting mental health for veterans and supporting people struggling with addiction. Similarly, Tech Matters’ Aselo project, which is a crisis response contact center platform, is being entrusted with the data from multiple national child helplines to develop open-source AI algorithms to categorize conversational content, a capability needed for a handful of AI-powered initiatives to improve services and data quality. Beyond that, Tech Matters is working with Child Helpline International (the association of the world’s child helplines) to bring together data from many countries (excluding personal information) to gain deeper insights about the challenges faced by children.

Nexleaf Analytics is already deploying shared data infrastructure for health systems. By building out data management systems that can operate across an entire country’s health system, along with standardization of data interfaces (APIs), the platform aggregates data from multiple vaccine refrigerator models operating across countries in Africa and Asia, together with trip data from transport vehicles and cold boxes. Countries get a holistic picture of their end-to-end supply chain for vaccine distribution. By enabling users to skip the hard work of collecting, cleaning, and labeling the data, they can instead devote their limited resources to putting this data to work for managing their health systems. Further, vaccine cold storage equipment manufacturers can focus on their core competencies, while being assured that the data from deployed equipment will help their customer countries manage their cold storage assets.

Sharing data needs more than technical infrastructure; it also requires social and legal infrastructure. Tech Matters recently launched the Better Deal for Data, a lightweight data governance initiative, to develop a simple set of commitments for social sector organizations to adopt. First, it commits adopting organizations to safeguarding the data of the communities they serve, as well as promising to not sell that data to for-profit companies. Second, it is intended to make it far easier to combine data from many organizations into single datasets to advance the development of knowledge and AI models for social good.

Countries and policy leaders are likely moving in this direction. Kenya’s National AI Strategy lays out three central pillars: AI digital infrastructure, data, and AI research innovation. The second pillar is centered on addressing the challenge of insufficient data in order to ensure that AI models can be appropriately adapted to the Kenyan context. The ultimate aim of this pillar is to achieve “enhanced dataset quality, useability, shareability and sovereignty” by creating a robust governance framework; developing secure data sharing, access, and interoperability protocols; and incentivizing the creation of open, high-quality AI datasets.

With radical interoperability done right, we can unlock the potential of the data for every application without compromising the sovereignty, ownership, and privacy of individuals, entities, or countries. It’s sharing in a way that promotes individual and group rights rather than exploiting users who have often unknowingly agreed to share their data by clicking a box next to the terms of service. The path to creating interoperable datasets requires particular expertise, and social sector leaders must invest in conscientious and deliberate data ecosystem design. Platforms like Data Commons allow data interoperability across fragmented sources without data cleaning or joining; these types of standards-driven approaches will enable stronger data-driven insights.

By bringing siloed data together, we can get better answers to solve the problems we face as we seek to achieve global goals.

Build Together: How AI Applications Succeed

Building AI is still very hard and expensive, and taking a “go it alone” approach will waste resources. Many social enterprises are already engaged in AI efforts that aren’t necessarily duplicative but could benefit from being brought under the same umbrella. And funders need to lead the way by establishing and investing in common infrastructures and hubs for collaboration.

A single AI collaborative could, for example, bring together relevant datasets, researchers, nonprofits, governments, and more to advance AI across a focused set of use cases in a particular field. By bringing together data and identifying gaps, sharing, or even collectively building models, collaborative work on AI solutions can enable action across an entire ecosystem. For example, bioacoustics and camera trap analysis data and models can aid in species detection and population estimation, rainforest preservation, anti-poaching, and ocean noise reduction. In this field, organizations have already partnered to build common AI-powered tools for this field that are built from large shared datasets such as Wildlife Insights and Arbimon. Wildlife Insights, for example, was a coalition of conservation nonprofits that built shared infrastructure to allow any organization to upload camera trap photos and access species identification models, conduct data analysis, and collectively build both a more comprehensive view of species globally and better AI models.

Any field with common objectives and a lot of data spread across many organizations could benefit from joining together to support shared AI solutions. Let’s do the hard math once, instead of trying to build similar models from scratch over and over again, and focus on the true challenges of turning AI outputs and insights into action.

Conclusion

To maximize the potential of AI for social good, multilateral organizations, foundations, and nonprofits need to be engaging with each other. And donors need to lead the way by establishing and funding sustainable data-gathering endeavors, as well as shared data infrastructures, including:

tech infrastructure, such as the cloud systems, digital connectivity, interoperable software, and data collection tools; and
soft infrastructure, such as skills building, routines, ethical norms, and financing.

Simultaneously, by bringing together organizations with similar objectives into collective efforts to advance AI-enabled solutions, more organizations can benefit from AI.

In a Harvard Business Review interview, Bryan Catanzaro, VP of Applied Deep Learning Research at NVIDIA, articulated a prevailing belief that “the most useful data to AI is always going to be the most secret data.”

Perhaps this makes sense for profit-motivated companies. But in the social sector, we can pivot away from a competitive approach to data that results in walled-off data siloes. Indeed, to solve the intractable problems we face, we must turn these models on their head. The most useful data to AI for the social impact sector will be the data that is pooled, in order to ensure the data is big enough, updated enough, and accurate enough to be up to the task of solving humanity’s biggest problems.

Taking concrete action now to gather better data, bring data together, and collaborate for AI innovation will set the social sector up to reap the benefits of AI’s emergent power for the greater good.

Read more stories by Nithya Ramanathan & Jim Fruchterman.

Technology

Gather, Share, Build

Garbage In, Garbage Out

Fund the Groundwork: Gather—and Pay For—Better Data

Share That Data: Interoperable Datasets Spark Innovation

Build Together: How AI Applications Succeed

Conclusion

Create a free SSIR account to access this content.

This article is free.