Will the Next Evaluation Breakthrough Come from Online Shopping?

Back in 2006, a modest DVD-by-mail company called Netflix offered a $1 million grand prize to any programming team that could improve its ability to recommend films that matched the interests of its customers. Meanwhile, Amazon was betting its future on Prime, a subscription service built on its own recommendation tool. For both companies, the concept was simple: Develop an algorithm that uses the opinions and actions of customers as predictive data about which products other, like-minded customers would want. The algorithm, they hoped, would act like a highly precise, 21st-century version of a dynamic focus group, continuously revealing the detailed wisdom of the crowd—and boosting customers’ consumption, satisfaction, and loyalty.

Today, Netflix credits its algorithms for 80 percent of the hours customers stream, as well as a rock-bottom cancellation rate that saves the company $1 billion a year. Amazon says recommendations account for a stunning 35 percent of its revenue. This is great for them, but it’s also promising news for social change practitioners. That’s because these systems are built on a model called “collaborative filtering,” an approach that‘s jumped the rails from commerce to civil society, where it’s shown the potential to surface quicker, cheaper, better data about notoriously hard-to-measure social change.

Collaborative filtering

Qualitative data analysis is grueling. Instead of hard numbers, line graphs, and percentage points, evidence often lies in the linguistic testimony of community members, narrative examples from partners, and other diverse observations. The voluminous and unstructured nature of this data makes it difficult to analyze quickly and accurately. Skilled evaluators scour the evidence manually or with the aid of software, identifying patterns and coding responses, parsing keywords and expressions. But this approach can be error-prone—bias creeps in, assumptions are made. Which anecdotes or observations verifiably represent the change that may or may not be taking place? How can you validate these insights? How often do we read an organization’s report that attests to complex transformation, only to be disappointed or unsatisfied that the offered proof lies in list-like or potentially cherry-picked quotes and examples?

Collaborative filtering, instead, strives to tap into the collective intelligence of the studied community. While Netflix and Amazon use this approach to create behavior-ranked predictions about product desires, the social sector can use it to surface the highest peer-ranked insights about what’s really happening in that community.

Let’s put this into context. Imagine you are the mayor of a town and want to know the priority concerns of your citizens. Rather than conducting a traditional survey, where answer options and data analysis are left to the research team, you want the community to collectively identify and evaluate emerging insights from the data. So you develop a survey that asks Citizen B to read and then rate—based on one to five stars, with five being the most challenging—a statement by Citizen A. Let’s say Citizen A thinks the new interstate construction is the biggest challenge facing the town. Citizen B then gets a chance to rate Citizen A’s statement about the new construction as a four-star challenge, and then records her statement that she believes a recent string of home break-ins is a bigger, five-star challenge. Her statement then goes to another set of citizens, who provide their rating on whether break-ins are a priority concern and have the chance to provide their own statements, if they’d like. At the end of the week, the town’s website posts a ranking of the highest- and lowest-rated citizens’ statements. You now have a priority list of concerns within your community, validated through peer-to-peer evaluation.

A field tool for rapid analysis

So how can the social sector actually use this in the field at scale? In 2013, a research team at UC Berkeley—headed by Professor Ken Goldberg and one of us, Brandie Nonnecke, a postdoc at CITRIS and the Banatao Institute—began experimenting with collaborative filtering as a data collection tool for social good. The result was DevCAFE, an open-source mobile platform that enables rapid and scalable collection and analysis of quantitative and qualitative data in the field.

A Ugandan woman uses DevCAFE to provide feedback on effectiveness of family planning training program. (Photo by Brandie Nonnecke)

DevCAFE has since been tested in numerous settings to evaluate its effectiveness. In Uganda, the team set out to source recommendations to improve a women’s training program on family planning. Women answered a few standard quantitative questions, and then had a chance to provide—in their own words—an idea for improvement. The other women then rated each of these ideas, and the end result was a filtered list of prioritized and community-validated suggestions. The top three included adding a water well, including men at training sessions, and improving access to family planning materials. The team successfully implemented all three.

Tackling complex evaluation

But what if you wanted to use collaborative filtering not as a predictive or prescriptive aid (like Netflix or DevCAFE, respectively), but as an impact assessment method for efforts that involve lots of qualitative data, slow changes over long periods of time, and an unpredictable array of possible impacts?

These thorny evaluation challenges are applicable to a wide variety of fields, from projects focused on social norms to influencer efforts. But they are particularly salient for media—a field where, despite the presence of Nielsen and other consumer information tracking services, social impact measurement remains a huge challenge. Has news coverage of the Black Lives Matter movement, for example, changed the way people think and act—and if so, how, how much, and what does that change look like on the streets? Some interesting tools are available, including: platforms like MIT’s Media Cloud that explore the way ideas and stories spread; tools like NewsLynx that track and tag online reaction to news stories; and international survey platforms like Open Data Kit. And in the world of international development, there have certainly been standard-setting (and expensive) evaluations, such as the recent World Bank study that examined the impact of MTV’s drama series Shuga on sexual health in Nigeria and the classic studies in Brazil linking the spread of soap operas to lower fertility rates. Yet few deal with media data in ways that mimic a focus group, with its strengths of participation, authenticity of voice, and peer-to-peer validation of qualitative observations.

Collaborative filtering: media evaluation in action

In 2016, inspired by DevCAFE, the documentary film organization ITVS—where the other of us, Eric Martin, works—set out to adapt collaborative filtering to test whether it could shine a light on where, why, and how change happens in media. Through our ITVS platform DocSCALE, we began using collaborative filtering in India as part of an independently evaluated global documentary film and social change program called Women and Girls Lead Global.

A group of 1,338 men and women in gender-based violence prevention programs—all from poor communities in Maharashtra and Rajasthan—participated in the program. In response to the DocSCALE mobile phone survey, 1,001 participants recorded their own statements describing changes they observed in the interactions between men and women in their household and/or community. The group also provided 2,668 peer-to-peer ratings on these observations (participants could rate more than one observation). Alongside the DocSCALE survey, an evaluation team performed extensive, face-to-face research, using baseline-endline surveys, focus groups, interviews, and other methods to measure impact. It provided detailed findings that could be compared against the less traditional collaborative filtering results.

During our analysis, we found that DocSCALE captured most of the same changes that the more-extensive, traditional evaluation surfaced. It also captured those changes not by seeking responses to a long list of measures and questions (as traditional evaluation did), but by inviting respondents to record simple open-ended statements and letting peers rate the wide range of statements generated Two kinds of statements were most highly rated for accuracy by fellow members of those communities: 1) statements that described more “polite” behavior of men and boys toward women, and 2) statements that described more “helping with household chores.”

Survey participants in India with simple feature phones use the ITVS DocSCALE platform to record observations on male-female interactions. (Photo by Abhishek Srivastava)

The evidence collaborative filtering captured was much more personally detailed than data from the traditional surveys—respondents described young men helping with “brooming and cleaning,” talking about school with their sisters, and teaching a young woman to ride a motorcycle. But these were more than simple anecdotes; the peer-rating system showed how strongly the community members believed these highly rated statements were changes they also saw in their households and community.

Future testing of the platform is needed and likely. Some see DocSCALE as an evaluation tool, others as an aid for monitoring or as a feedback mechanism to lift grassroots voices to the surface and provide community-generated course correction data along the way.

The future of evaluation and why the social sector needs programmers

Of course, these experiments are nascent, their results are incomplete, and we’ve barely tapped the potential of collaborative filtering as a tool for social change. But several types of organizations might particularly benefit from collaborative filtering:

Participatory-minded organizations looking to reduce the distance between the evaluator and the evaluated. By involving participants in the evaluation process—not as a courtesy, but as a fundamental part of the design—collaborative filtering provides greater opportunities for feedback. Smartphone versions like the one developed for DevCAFE can give participants an instant, visual sense of where their perspectives fit in with their community.
Complex social change organizations that deal with lots of qualitative data. Anecdotes, case studies, and qualitative trend-searching can be the bane of organizations trying to show their impact. Collaborative filtering helps add quantitative value to qualitative data. Shoring up a participant’s quote with hard numbers and ratings by fellow participants, for example, provides a stronger indication that the quote speaks for a community.
Organizations that serve solo practitioners who can’t afford impact evaluations. In the field of media, hundreds of socially committed filmmakers tell inspiring stories every year, execute inspiring engagement campaigns, and then struggle to measure impact. A future collaborative filtering tool could help solo practitioners (not just in media) conduct their own rapid evaluations without expert analysis or big budgets.

As these groups look for ways to improve their assessments, we recommend that they consider:

Incorporating collaborative filtering into survey design. The models for enabling collaborative filtering in DevCAFE and DocSCALE are open source. Those who work with online companies to conduct surveys should ask those companies to incorporate collaborative filtering into their platforms so that they can better enable the emergence of participant-driven and validated insights.
Advocating for more emphasis on feedback and qualitative data. Some of the most valuable data for complex projects and mid-flight feedback do not fit into the pure experimental model of a randomized controlled trial. The value of peer-rated, participatory data needs champions if it is to develop into a more useful and widely used asset. The white papers for DocSCALE and DevCAFE can help serve as resources for those looking to incorporate collaborative filtering into their work.

Collaborative filtering saves for-profit companies billions, and makes our lives easier by streamlining and structuring vast amounts of data. Repurposing this technology for the public good makes sense. By empowering participants to collaboratively filter data, we can reduce bias, and surface more nuanced and accurate insights to improve social sector effectiveness.

Read more stories by Brandie Nonnecke & Eric Martin.

Measurement & Evaluation