It is often said that “what gets measured gets done.” This common phrase implicitly frames measurement not just as a tool for capturing information about systems but also as an intervention itself. This belief in the transformative power of measurement may partially explain the huge sums of money spent each year by governmental, nongovernmental, and private sector organizations in developing, maintaining, and publicizing measures.
We aren’t referring to the private use of measurement by organizations to evaluate and improve their own internal performance, or randomized controlled trials to measure the effectiveness of a new social service program or data mining to better understand customer behavior. That is a vast and important subject that has been widely explored.
We are focused on the broader use of measures to report on, and hopefully drive, large-scale social change. Consider one of the best-known examples, the Consumer Price Index, or some of the lesser-known ones, such as the Corruption Perceptions Index, the Sustainable Governance Indicators, and the National Health Security Preparedness Index.1 But it’s not just indexes. High school graduation rates, for instance, are an example of individual measures used to hold schools and districts accountable and drive improvement.
As much attention as there is now on using measures to foster social change, it is likely to increase in the future. That’s because our ability to track, measure, and analyze all sorts of things is growing by the day. Low-cost sensors and microprocessors, wireless connectivity, and mobile devices, all connected to the Internet, make it easier and easier to collect data. And ever more powerful and less expensive computing power and data storage make it easier to analyze this growing mountain of data. At the same time, there is a growing push by the government, philanthropists, policy makers, and social change agents to measure and report results, and use the results to drive decision making and foster behavior change.
But for all the attention on and use of data collection and measurement, there is remarkably little discussion or research on how—and under what circumstances—measures actually “work.” The science of measurement remains largely silent on the topic, focusing instead on issues of validity and reliability, and treating the behavioral impacts of measurement as “reactivity to measurement”—a source of measurement error that must be minimized.2
This, then, is a case where theory lags behind practice. Measurement theory focuses mostly on how to find measures that accurately represent systems. This is critical. But funders, government officials, and social innovators also need guidance on using measures to improve systems.
We wrestled with this question of how measurement supports system improvement and innovation while selecting measures for the Robert Wood Johnson Foundation’s (RWJF) efforts to advance what it calls a Culture of Health. The purpose of this article is to share what we learned in that process and to start a dialogue about how one can better use measurement to promote large-scale system change.3
What We Learned About Measures and Health
The RWJF’s Culture of Health is an ambitious 20-year, social-change-based vision premised on the understanding that improving some of the most persistent health-related problems in the United States—including high health-care costs producing only mediocre health outcomes—requires new thinking to address the physical, social, economic, environmental, and cultural values that shape how diverse sectors allocate resources related to health and well-being.
To help operationalize this vision, the foundation worked in partnership with RAND to review relevant literature and engage a range of stakeholder communities in developing a framework for action that provides structure and detail in how the vision of a Culture of Health might be achieved nationally and in communities. The framework consists of four action areas: making health a shared value, fostering cross-sector collaboration to improve well-being, creating healthier and more equitable communities, and strengthening the integration of health services and systems.
Each of these four action areas has three drivers of change, which in turn have two or three national-level measures. There are 41 measures in all, but a refined set of 35 will be released soon. Some of these measures address health outcomes (such as preventable hospitalizations and disability associated with chronic conditions), while others address consumer access to and experience with health care (such as health insurance coverage and access to alcohol, drug, or mental health services).
Most measures, however, address factors that can help catalyze improvements in health and well-being. Many seek to underscore the importance of nontraditional partners in promoting health and well-being. For instance, a measure of enrollment in early childhood education recognizes that children who attend preschool are more likely to stay in school, go on to hold jobs, and earn more money—all of which are linked to better health. Several measures seek to capture how people think about health (for example, a survey-based measure of the value placed on investments in community health) and the extent to which they are engaged in community a airs (such as voter participation).
Together, the action areas, drivers, and measures illustrate (but do not list exhaustively) the priority areas that need consistent action, investment, and attention to create the systems, cultural, and social changes required for building a Culture of Health in America. Unlike a model—which implies something formulaic, fixed, and final—a framework speaks to a built-in fluidity. Similarly, the measures are not intended to prescribe specific actions but to stimulate discussion, catalyze partnerships, and promote policies. The measures are publicly reported on the Culture of Health website.
To the extent that existing measurement theory does provide insight on measures’ capacity to change the world, it is through accountability-oriented measurement. We first review this line of thinking and then propose an alternative view, which we call “catalytic measurement.”
Perhaps the best-understood use of measurement for accountability is “performance-based accountability systems” (PBASs), which link measures to rewards and punishments. In some cases, the measures and incentives are linked to desired outcomes, such as reductions in greenhouse gas emissions or increases in student test scores. Where outcomes are difficult to observe, measures can be linked to short-term actions believed to bring about longer-term outcomes. In health, for example, bonus payments to doctors are sometimes linked to clinical quality measures, such as the proportion of patients in a certain risk group who receive a specific type of evidence-based care. Widespread changes in clinical practice, it is hoped, can translate into significant system-level improvements in cost and quality.
The theory of change behind PBASs is simple—it assumes that people and organizations prefer getting rewards to punishments, and that they will adjust their behavior accordingly. Thus, provided that the incentives are well aligned with system goals, PBASs should lead to improved performance.
But that simple theory relies on several optimistic assumptions about the context in which the measures operate. First, there must be some way to deliver the incentive to the targeted individuals and organizations. This is easiest when there is an explicit division of labor and a hierarchical relationship (such as that between a funder and a grantee). It can also work in a market environment, where authorities provide information that consumers use to decide where and whether to purchase a product or service. For example, several states have developed quality rating and improvement systems that make data on measures of day-care center quality accessible to consumers, who can then factor them into decisions about where to send their children.4
Increasingly, however, social innovation involves complex systems that feature interactions among individuals, networks, and organizations in multiple sectors and professions. This makes it difficult to design an incentive system that creates the desired behavior and system change.
PBASs also assume that there is a body of academic or practical knowledge to identify which actions to encourage or discourage. Fields such as clinical medicine identify performance measures using a large body of evidence linking specific therapeutics and procedures to health outcomes (for example, giving aspirin to heart-attack patients reduces the odds of another heart attack). Yet, important aspects of large-scale social innovations, such as the Culture of Health, involve areas where the evidence base is weak or emergent.
For instance, there is a considerable body of evidence linking a range of social and economic determinants of health to the prevalence of healthy behaviors and, in turn, to health outcomes such as diabetes, hypertension, and heart disease. Yet, evidence of how to generate and sustain improvements in the conditions shaping the nation’s health and well-being is much harder to come by, in part due to the complex mix of individual, economic, and institutional causes at work, the complexity of decentralized systems of governance, and deep-seated differences in how people perceive and value health.5
Even in well-established fields with strong bodies of evidence, linking measures to incentives can have unanticipated or undesirable consequences. In 1956, the inaugural issue of the prestigious journal Administrative Science Quarterly included an article titled “The Dysfunctional Consequences of Performance Measurements” that examined how use of performance measures attached to con- sequences had skewed decisions in ways detrimental to overall organizational performance.
Recent behavioral economics research suggests that some of these unanticipated consequences emerge from the fact that incentives often trigger psychological, social, and organizational mechanisms that counteract them. For example, Dan Ariely of Duke University and colleagues from Carnegie Mellon University and the University of Toronto asked a group of subjects to play games emphasizing creativity, memory, and motor skills. They divided players into groups getting low, medium, and high monetary incentives and found that the highest reward levels actually had detrimental effects on performance.6
The authors argue that incentive-based mechanisms trigger other psychological mechanisms that reduce intrinsic motivation by framing performance as a reward system and by signaling that the incentivized task is so undesirable that one must be paid to do, and reduce trust by signaling (perhaps unintentionally) that the PBAS designer does not trust others to do the job well. The authors found that incentives work somewhat better for concrete, less conceptual tasks. However, most social innovations by their very nature require individuals and organizations to work collaboratively to tackle new problems for which there are no prefabricated solutions.
Accountability-oriented measures linked to incentives can work, but under somewhat limited circumstances. Moreover, social-change agents often use measures without incentives in the expectation that they can improve systems. Thus, to develop a measurement strategy for fostering and monitoring progress toward a Culture of Health, we needed to identify new ways of thinking and talking about measurement that identify some of the non-incentive mechanisms that might link measurement with change, and that relate these to bodies of social science evidence.
We identified four such mechanisms: setting goals, reframing issues, creating common terms of debate, and shifting venues.7 Each of these four mechanisms relies more on catalyzing creative and collaborative action within and across sectors than on holding actors to account for pre-specified processes and targets. We call this line of thinking “catalytic measurement.”
Setting Goals | The simple fact that a measure is created and announced publicly may create an informal expectation that it represents a worthy goal—especially when the measure is propounded by a highly visible or respected organization. Nonprofits, for instance, routinely publicize and track fundraising against overall goals assuming that the simple act of stating or publicizing the goal will cause people to donate more money to the cause. Organizations trying to eliminate the pay gap between men and women regularly publicize the measure, expecting that doing so will both illuminate the problem and encourage businesses to increase women’s pay.
Using performance measures this way is extremely common, and there is considerable anecdotal evidence to support this claim. But we found no research showing a causal link between publicizing measures and resulting behavior change. There is, however, evidence from laboratory and naturalistic field studies at the individual and group levels showing how goal setting might work. Professors Edwin Locke of the University of Maryland and Gary Latham of the University of Toronto have done a great deal of research in this area. They conclude that goals can help improve the performance of individuals working in systems by directing their attention toward the most relevant activities, enhancing their motivation and persistence, and activating relevant knowledge they already possess.8
To better understand how this works, consider the “family health-care costs” measure—one of the national Culture of Health measures—which seeks to direct attention first toward the problem of health-care costs for families and to motivate change agents to undertake and sustain meaningful action to lower the burden. As of this writing, the measure does not include specific targets (a specific cost burden level), but the goals implied by the measure are purposefully directional in nature (to lower cost burdens).
As with incentives, goal setting might be less effective where innovation and creativity are most needed—when the tasks are highly complex, or where there is an absence of know-how about how to complete them. This suggests that goals that address relatively simple and easily executed tasks, such as paper recycling, might be more likely to succeed than those that involve the adoption of novel technologies or practices.
Nonetheless, the literature suggests that goals for complex tasks can be made more effective if they include less ambitious proximal goals (interim goals that represent small steps toward more ambitious goals) and learning goals (those that focus on acquiring skills needed to accomplish key tasks). In the Culture of Health measurement set, rather than use a broad measure like socioeconomic status, we used affordable housing (specifically, percentage of the population spending 50 percent or more on housing), which is somewhat narrower and more actionable than socioeconomic status writ large.
Framing Issues | Measures are simplifications that focus attention on some aspects of a system and not on others. While this “narrowing” tendency of measures can have negative consequences, a certain amount of narrowing and focus can help people better comprehend complex systems like health. Another way to think about this is that measures—and sets of measures—can help “frame” issues. Frames are the mental shortcuts (often attached to emotive associations) that individuals and organizations use to identify which aspects of an issue are most worthy of attention, how those aspects are related, and the “good-bad” valuations associated with them.9
Framing is one of the main ways in which the Culture of Health measures might have an impact. Indeed, a primary goal of the Culture of Health is to inspire newer ways of thinking about health, including the notion that health is affected by factors well beyond what is conventionally regarded as the “health sector” in the United States—such as hospitals and clinics.
The measure on “valuing the physical and social environment influence on health interdependence,” for example, seeks to track progress in one important aspect of that reframing. The measure comes from a question on RWJF’s National Survey of Health Attitudes. As of 2015, the survey found that only 34 percent of adults believed that one’s surroundings (both other people’s behaviors and community conditions) have an impact on an individual’s health. We hope that this measure will illuminate that connection in people’s minds and encourage them to see the ways in which paying attention to traditionally non-health factors in their communities can affect health and well-being.10
The entire ensemble of Culture of Health measures is designed to influence the range of factors that individuals, organizations, and policy makers see as relevant to health. Some individuals not familiar with discussions on the social, economic, and cultural determinants of health may be surprised to see that libraries, voter and volunteer participation, community policing, and housing affordability are among the Culture of Health measures. The inclusion of those measures was meant to focus audiences on the frame that health and well-being is determined by more than what happens within clinics and hospitals.
Creating Common Terms | Framing, in turn, provides the foundation for the third mechanism through which measures can promote change: creating common terms of discourse that make it easier for individuals and organizations to see things or actions as the same, or at least similar enough for comparison.
In the field of automobile accident investigations, for example, simple but powerful measurement categories such as “recognition error,” decision error,” and “performance error” provide common terms of reference that allow stakeholders engaged in auto safety to identify common themes across the tens of thousands of auto-related deaths each year, and to have intelligent discussions about how and where to target scarce resources.11 This, in turn, can promote cooperative action toward a common goal, an important factor when tackling large social and system changes such as those encompassed by the Culture of Health vision.
Of course, measures can go too far in simplifying the components of complex systems into comparable parts, stripping away too much contextual information. Critics of student achievement tests, for example, complain that scores miss important aspects of learning and mask important differences among schools and communities. But such simplifications allow people to see the problems with specific schools, communities, or other identities as similar to others and therefore perhaps generated by issues within the larger system. This may increase individuals’ willingness to engage in collective action by highlighting commonality of interests, increasing the perceived scale of the problem, and helping them connect via social networks with individuals in other communities.
For instance, there is evidence that the publication of measures of clinical quality of care encourages physicians to seek out improvement ideas from peer clinics that score better than they do. There are also many examples of performance measures and data playing a role in debates about large-scale system change. Economic indicators, such as the Consumer Price Index, shape discussions about strengths and weaknesses in current economic policy, and often provide a basis for new policy formulation. Commensuration may also trigger competition by making it easier for consumers to “comparison shop” among rival providers of the same good. This is the intent of various consumer rankings, such as the U.S. News & World Report college rankings, which influence an institution’s reputation among peer institutions and admissions statistics.12
In a Culture of Health, where shared value of health is central, the measures provide a common way to talk about health drivers across diverse communities. For example, the measure on “disability-adjusted life years related to chronic disease” provides a common way to talk about a wide range of health conditions (such as diabetes and chronic respiratory problems) affecting individuals in all communities. The same can be said of the measure on adverse childhood experiences, which calls attention to similarities in the long-term effects of a range of childhood traumas (neglect or physical, verbal, or sexual abuse) on the health and well-being of affected individuals across populations.
Shifting Venues | How an issue is framed and discussed among actors can have an impact on how and where it is acted upon—the venue. Political scientists Frank Baumgartner and Bryan Jones explore this dynamic in their work on the interaction between “policy images” and “policy venues.”13 Policy images are like frames, in that they help define the dominant beliefs about the causes and solutions of problems and types of approaches that are considered “good” or “bad.” Venues are the institutional “places” where key policy decisions are made (such as congressional committees, city halls, courts, and corporate boardrooms) and the rules of the game used to make those decisions (for example majority rule, precedent, and application of benefit-cost criteria).
To understand venues, consider how Baumgartner and Jones trace the history of nuclear power plant regulation in the United States. The original agency regulating nuclear power plants, the Atomic Energy Commission (AEC), was friendly to the industry and granted preferred access to “corporate, political, and technocratic elites advocating nuclear power.” Once technical AEC staff started disseminating data raising questions about safety, critics inserted themselves into public debates on nuclear safety, effectively breaching the old policy monopoly. This led to the replacement of the AEC by the Nuclear Regulatory Commission, a new venue with a stronger safety-oriented mandate under scrutiny by a range of parties mobilized to check the spread of nuclear power. Thus, there was a positive feedback loop where changes in images or frames supported changes in venues, which, in turn, supported further changes in frames.
To understand how venues apply measures, consider the impact that the U.S. News & World Report law school rankings have had on law schools. From in-depth interviews with law school administrators and faculty, sociologist Michael Sauder found that raising the visibility of the rankings among prospective students and others helped change the internal balance of institutional power within law schools, leading to larger budgets for activities directly related to increasing schools’ ranking, and changes in job descriptions and creation of organizational structures (venues) with responsibility for managing the school’s profile in the rankings.14
Several of the Culture of Health measures we selected are explicitly designed to bring new actors into discussion about health and new venues into decision making related to health and well-being. For instance, on the surface, a measure on youth exposure to advertising for unhealthy foods (based on data from Nielsen Media Research) seeks to estimate the prevalence of unhealthy media messages aimed at kids. However, an additional purpose of the measure is to promote stronger collaborations between the health, media, and food and beverage industries, whose cooperation is needed to show improvements in scores on the measure.
Similarly, a measure on US corporate giving to community and economic development, K-12 education, and higher education (business leadership in health) is designed to encourage corporate involvement in health and well-being. And a measure of the percentage of hospitals that have a formal alliance with health care and insurance organizations, state and local government, and community organizations (hospital partnerships) is designed to encourage collaboration between hospitals and various social service and community organizations.
Importance of the Social Periphery of Measurement
Our goal in proposing these four mechanisms is to encourage deeper discussion about how best to design and select measures, especially where attaching them to incentives is not an option or does not make sense. If we are correct, the most important implication of this argument is that developers of measures need to spend as much time considering the social, cultural, and organizational contexts in which measures will be used as on the technical qualities of the measures themselves.
While providing detailed recommendations about which types of measures belong in which contexts is beyond the scope of this article, we propose the following conclusions. In situations where goals, cause-effect relationships, and roles are clear, performance-based accountability systems may provide an effective way to track and incentivize progress and to ensure that key actors remain focused on relevant activities—though even in these situations, failure to pay attention to context can lead to unanticipated and undesirable consequences for performance.
However, in new or emerging initiatives such as those often promulgated by philanthropy—diffuse organizational contexts, or interest in broader social and systems change like the Culture of Health—it is often difficult to identify clear targets. In these situations, it seems more appropriate to use measures not to narrow the focus but to trigger conversations, attract new change agents, encourage new partnerships, and foster joint exploration—in short, to start conversations, not settle arguments.
Unfortunately, the social and institutional context of measurement and information has for the most part been relegated to the periphery of measurement theory and practice, which tends instead to focus on issues of accuracy, validity, and reliability. Clearly, the standard measurement criteria of validity and reliability must remain central to any discussion of performance measurement—after all, changing systems based on a false view of reality is clearly risky.
However, the failure to attend to what writers John Seely Brown and Paul Duguid call the “social periphery” of measures and information15 may limit the potential impact of the measures that we as a society invest so much time and energy in creating and maintaining. One of our goals is to start a discussion about how issues of the impact of social context can be moved from the periphery into the core of measurement practice. We look forward to a robust—and catalytic—discussion.