Using Community Data for Data Justice

people tending a community garden with roots representing community data

When the Coalition of Communities of Color (CCC) began a multi-year collaboration with the Oregon Health Authority (OHA), they worked together to modernize a critical public health information source: the Oregon Student Health Survey. This survey, disseminated annually across Oregon, was designed to track health trends and inform policy decisions affecting thousands of young people and families.

But there was a problem. Year after year, this survey illuminated inequities, showing, for example, that students of color experienced higher rates of bullying or mental health challenges, without providing any insight into why these inequities existed, how they were experienced, or what communities wanted done about them. The data revealed gaps but offered no pathways to close them.

Transforming Data for Equity and Justice

Data impacts every part of our lives. This article series, sponsored by the de Beaumont Foundation and the Robert Wood Johnson Foundation, explores the harms caused by data and discusses how data equity and justice can improve people’s health and well-being and drive long-lasting social change.

You'll get email alerts when there is new content in this series.

Working alongside other culturally specific organizations within their coalition and researchers of color in their region, CCC set out to demonstrate what better data could look like for the Oregon Student Health Survey. They worked with high school teachers who had deep relationships with students and met with students to understand what kinds of questions mattered most to them. Simple and straightforward questions like “How are you doing?” and “What supports do you need?” revealed issues that the state’s standardized surveys had completely missed. The process generated rich, contextual data showing not just that systems were failing, but how they were failing and how students desired their needs to be met. The process also demonstrated that working with people with lived experiences of the issues being researched generated better questions and, therefore, better data about these issues.

And the improvements resulting from better data were tangible. OHA created a Youth Data Council, involving young people directly in designing aspects of the next version of the Student Health Survey. CCC documented the survey modernization process in a detailed community brief. For the first time ever, the Oregon Student Health Survey included three open-ended questions, yielding over 4,000 qualitative responses. OHA published a groundbreaking analysis of what students actually wanted to say when given the chance.

As CCC and OHA’s experiences illustrate, it is possible and necessary to pair data generated on community terms with institutional data. When you have that diversity of evidence, you can create better policies that are more grounded in context, experience, and community knowledge.

But despite these successes, the work continues to be resisted. Why? Because it challenges “gold standard” dominant quantitative approaches. For instance, dominant institutions like OHA are still overwhelmingly focused on maintaining longitudinal quantitative datasets that track changes in student health metrics over many years. Any changes to question types or approaches are dismissed because they would disrupt these trend lines. The desire to keep data “reliable” for making claims over time has become more important than actually changing the realities the data tracks.

The Limitations of Dominant Data

In the United States, dominant institutions typically rely on what we call dominant data—information generated by powerful institutions like governments, for-profit companies, and universities. Dominant data is collected on the terms of these institutions to support decision-making about large populations and the allocation of significant resources. This data is typically expressed as statistics—quantified, aggregated information that institutions treat as objective and authoritative. Dominant data includes two primary data types:

Administrative data, which refers to information collected when individuals interact with systems to access programs or services. When a student enrolls in school, they provide demographic information, test scores are recorded, and attendance is tracked. When someone applies for public benefits, interacts with the criminal justice system, or uses public transportation, data is generated. These records accumulate into massive datasets that institutions use to understand populations and track outcomes from a quantitative perspective.
Population-level survey data is gathered through large-scale surveys designed to monitor trends across broad populations. The US census, the American Community Survey, and state health surveys are tools built to provide a bird’s-eye view of social conditions, typically through standardized, multiple-choice questions developed by researchers and experts.

Both types of data serve important functions. They can highlight population-level inequities and help institutions track certain changes over time.

The problem isn’t that these forms of data exist. The issue is that this data has achieved a position of supremacy in our institutions and is treated as the primary or only form of trusted evidence for decision-making. In other words, dominant data is presumed to be objective, scientific, and the most authoritative way to understand reality. Meanwhile, other forms of knowing are dismissed as anecdotal or insufficient. This is particularly insidious when decision-makers demand statistical proof to “validate” the experiences of marginalized communities before they can be trusted and acted on. The history of how statistics came to hold this authoritative position is deeply intertwined with racism and eugenics, a troubling legacy that continues to shape how we understand and value data today. The methodologies themselves often carry what scholars call “white logic”—embedding racist assumptions into the very frameworks used to generate and interpret data. This overreliance on statistics creates several critical limitations:

It centers white experiences as the default. Most dominant data collection tools and methodologies are most reliable for white populations. Sampling strategies, question design, and analytical frameworks can fail to capture the experiences of communities of color with accuracy or nuance. Further, when data about marginalized communities is collected, it’s frequently suppressed or omitted entirely due to a “small numbers problem.” This happens when institutions claim there aren’t enough people from certain communities in their datasets to reliably report findings, resulting in entire populations being erased from official records.

It validates deficit-based thinking. When data is presented through the lens of inequities between groups, showing how communities of color fare worse than white communities across various metrics, it reinforces narratives that certain populations are problems to be solved rather than people experiencing the impacts of unjust systems. We track how many youth were bullied without understanding racism, transphobia, or other forms of systemic oppression. We measure without providing meaningful pathways to change. Overdependence on statistical significance as the arbiter of truth has been critiqued for costing us jobs, justice, and lives by elevating narrow quantitative standards over human realities.

It lacks context. Dominant data excels at identifying what is happening based on what they track (e.g., inequities in health outcomes, educational achievement, housing stability) but provides little insight into why these inequities exist or how to address them. Without understanding the historical, structural, and cultural contexts that shape people’s lives, institutions cannot develop responsive solutions.

It operates on institutional terms, not community terms. The questions asked, the categories used, the timeframes imposed are all determined by the institutions collecting the data, not by the communities being documented. This means the resulting data primarily answers questions that matter to institutions and not questions that matter to communities.

It marginalizes and erases. Communities that don’t fit neatly into institutional categories, that are deemed too small to count, or whose experiences can’t be captured through standardized instruments are either erased from data sets or get aggregated into a racial category that often obscures the diversity of inequalities groups in that institutional category experience Their realities don’t inform policy because they’re not seen by the data systems that drive decisions.

We often hear about the desire for “data-informed” or “data-driven” decision-making. The fact is decision makers in dominant institutions don’t actually rely primarily on dominant data and statistical information to make decisions. They make decisions based on politics, power, and priorities, and then use data to justify those decisions. If the only trustworthy data available comes from dominant sources and reinforces dominant perspectives, the cycle continues.

Community Data as an Antidote

But what if there’s another way? What if the knowledge already held by communities, generated through their everyday experiences, relationships, and organizing, could be elevated as trusted evidence for decision-making?

This is community data: evidence generated by communities about their everyday lives, reflecting community-centric forms of knowing, being, doing, and dreaming. Community data can take many forms, including numbers, words, art, music, sound, maps. It encompasses the multiple and diverse ways communities share, express, and articulate their lived experiences and desires.

Critically, community data requires the systematic collection, interpretation, and use of information that is produced on community terms. It provides contextual information about human experiences from the past, present, and desired futures.

Community data is action-oriented and values-based. It is not intended to be neutral, as data can never be neutral. Instead, it is designed to highlight unfair realities and identify the changes needed to support larger portions of the population, embodying the Curb-Cut Effect, where solutions designed for the most vulnerable end up benefiting everyone.

Unlike dominant data, which is normally gathered impersonally or transactionally, community data emerges from relationships. It surfaces through conversations between caseworkers and families, focus groups led by community members, participatory research processes, community organizing efforts, and the daily work of culturally specific organizations that maintain deep connections with the people they serve.

Community data provides what dominant data cannot: the context to understand why inequities exist, the nuance to see how systems are failing specific populations, and the solutions from those with lived experience. When a housing caseworker visits families in shelters, when community health workers document barriers to accessing care, when youth organizers map their experiences of their neighborhoods: This is community data. It reveals not just problems but pathways forward, grounded in what communities themselves identify as needed and desired.

Community data is not simply qualitative data or stories that supplement quantitative findings. While community data is often, but not always, qualitative in nature, what distinguishes it is not its format but its foundation. Community data is built on five essential principles:

It is based on relationships. Trust is essential. Community data is captured through sustained, meaningful connections and not extracted through one-time interactions.
It respects community-defined boundaries. Communities determine what information should and should not be collected, how data should be stored and shared, and what protocols protect people from harm.
It activates a diversity of data actors. Everyone from program coordinators to outreach workers to direct service providers holds community data through their relationships and daily work.
It challenges dominant systems. By introducing community perspectives into institutional decision-making, community data pushes systems to be responsive, accountable, and transformative.
It ensures tangible benefits for community members. Community data must lead to meaningful improvements such as better services, increased funding, policy changes that reflect community desires.

When we say community data is an antidote to overreliance on dominant data, we mean that community data addresses the specific limitations that make dominant data insufficient for justice work. It provides the missing context, centers marginalized voices, operates on community terms, highlights strengths and desires rather than deficits, and makes visible what dominant systems have made invisible.

Together, when both dominant data and community data are valued and relied upon, we can create a fuller, truer picture of what communities need and want. But currently, the infrastructure, resources, and legitimacy afforded to dominant data far outweigh what is available for community data. This imbalance must change.

Building Infrastructure for Community Data

Recognizing that community data requires intentional infrastructure and sustained investment, CCC is working to build a community-led digital platform designed to house, organize, and share community data for environmental justice.

The vision came from a simple but powerful realization: Community-based organizations already hold tremendous amounts of community data, but it typically exists in fragmented, disorganized forms (e.g., meeting notes, verbal accounts, observations, registration forms). This information is invaluable, but without systems to organize, interpret, and use it strategically, its power remains largely untapped.

The data ecosystem is being built to address this challenge. With the support of the Robert Wood Johnson Foundation and the de Beaumont Foundation since 2023, CCC has been working with over 20 community-based partners to envision what we’re calling Portland’s newest digital community garden.

It will enable multiple community organizations working on environmental justice issues to input their data into a shared system, where it can be aggregated around specific topics, analyzed collaboratively, and packaged in formats useful for various audiences, from government decision-makers to funders to community members themselves.

Importantly, this system is designed with two core functions in mind:

To support community power-building. The data ecosystem creates a trusted system for knowledge generation that helps communities track their own knowledge, evidence, and desires for a more inclusive future. It allows organizations to see patterns across their work, connect with other groups facing similar challenges, and build collective power through shared understanding.

To influence upstream decision-making. When policy makers and funders are considering how to allocate resources for environmental health, they should have access to systematically organized community data that reveals what communities are experiencing, what they’ve tried, what’s working, and what they need. Instead of using only dominant data sources, decision-makers would have robust community evidence to inform their choices.

The process of building this system has itself yielded important insights:

We’ve learned that questions of data ownership must be clearly defined and addressed from the outset. Currently, when community-based organizations enter contracts with government agencies to collect community data, that data often legally belongs to the government, even when communities lead the entire process. CCC is actively working with government partners to create mechanisms that allow community organizations to retain ownership of the data they collect.
We’ve learned that building community data infrastructure requires different kinds of expertise than building dominant data systems. We need front-end developers who understand community needs, data governance models inspired by Indigenous data sovereignty frameworks like those developed by Māori communities, and technical assistance that respects community timelines and processes rather than imposing institutional urgency.
Most importantly, we’ve learned that this work requires sustained, substantial investment. Community organizations are already stretched thin, responding to continuous cycles of crisis in their communities. They cannot take on the additional labor of systematizing their community data without dedicated funding for capacity building, technical support, and ongoing maintenance of data systems.

This data ecosystem represents one model of what’s possible when we commit to building infrastructure for community data. Our hope is that it becomes a replicable approach that can be adapted elsewhere, not necessarily as an entire data system, but as an inspiration for how institutions might support community organizations in collecting, organizing, and wielding their own data for liberation.

How You Can Advance Community Data

Community data is all around us. It’s in the conversations senior center staff have with elders, in the observations teachers make about their students’ needs, in the feedback gathered at community events, in the knowledge held by outreach teams. Whether you’re working in government, philanthropy, academia, or community organizations, you can:

Interrogate the limits of dominant data. Unpack assumptions about what counts as trusted evidence. Whenever you encounter dominant data being solely used to drive discussions or decisions, ask: What biases shaped these data? Whose perspectives are missing? What else do we need to know to address the inequities these data highlight?
Identify and uplift community data in your context. Begin identifying what community data your organization or partners already collect, even informally, and make a plan to formalize its collection and use.
Spotlight community data in decision-making spaces. Include community data in reports and presentations alongside dominant data. Explicitly name it as community data when you use or refer to it and explain why it matters. Funding decisions, policy choices, and strategic directions must be informed by community data, not just dominant data.
Invest in the creation and use of community data. Advocate for funding that allows community organizations to build their capacity to collect community data, retain ownership of it, and have decision-making power regarding how it’s interpreted and used.

We must transform what counts as trusted evidence. We need a future where most decisions that affect communities are made based on both dominant data and community data, where the two are in communication, where each is valued for what it offers, and where community data is adequately resourced and protected.

The work ahead is not easy, but it is necessary. And it starts with recognizing that the people most impacted by unjust systems hold the knowledge necessary to transform those systems. We must value, resource, and rely on community data.

Read more stories by Mira Mohsini & Andres Lopez.

sponsored

Measurement & Evaluation

Community Data Is Trusted Evidence

The Limitations of Dominant Data

Community Data as an Antidote

Building Infrastructure for Community Data

How You Can Advance Community Data

Create a free SSIR account to access this content.

This article is free.