Is Experimentation the Key to Effective Market and Policy Decisions?

The Power of Experiments: Decision Making in a Data-Driven World

Michael Luca & Max H. Bazerman

232 pages, The MIT Press, 2020

Filled with charming stories of experiments offering novel solutions to pressing social questions, The Power of Experiments: Decision Making in a Data-Driven World is an enjoyable read that celebrates the power of experimentation to create social change. Authors Michael Luca and Max H. Bazerman, both Harvard Business School professors, advocate experiments that have used randomized controlled trials (RCTs) to test the effects of online experiences on behaviors and choices. They invite readers to become part of “the experimental revolution” through illustrating the significant effects made by the small changes in design—such as an advertisement or a bureaucratic directive—determined by experimentation.

In The Power of Experiments, Luca and Bazerman explain how economists have teamed with psychologists to develop the experimental tools of behavioral economics, with the objective of replacing intuition and guesswork with evidence-based decision-making. They highlight the success stories from experimentation in the government and the tech sectors, and they predict that the experimental approach will soon become common in both for-profit and nonprofit organizations. Experiments, they write, give organizations “a new tool to test ideas and to understand the impact of the products and services they are providing.”

Despite the engaging stories, I found this book unsettling because Luca and Bazerman are never explicit about the book’s big secret: that the problems addressed by the experimental methods they advance are, with only rare exceptions, small-scale. No doubt, experiments with audience behaviors can save advertising dollars, encourage default choices, and show which font sizes and background colors lead to more clicks. But these experiments provide only very limited guidance for solving major social problems.

While the authors are appropriately realistic about where their work fits into the data analysis world, the feeling from this omission is reinforced by the book’s testimonials. For example, University of Chicago professor John A. List’s assertion that the book is a “masterpiece” meant for anyone interested in “understanding policy, behavioral economics, technology, and life itself” suggests a deep discrepancy about the book’s insights. While it is informative to know that a smiley face on an energy-saving message to consumers can influence how they set their thermostat, or that changing the default on participating in a retirement plan can be lucrative, these victories are small in scope—individual, rather than societal. The methodology hardly qualifies for List’s description of the book as part of “the deepest revolution in the social sciences in the past twenty-five years.”

To be fair, Luca and Bazerman do include two examples that demonstrate how experiments are “transforming how businesses and governments make decisions.” The most well-known of them begins their book: the Behavioral Insights Team (BIT), the brainchild that a group of social scientists and British civil servants bore in 2010 “to improve policy and government through the use of behavioral science.” Under the leadership of academic-turned-policy maker David Halpern, the group persuaded Queen Elizabeth II’s tax collectors to allow them to experiment with tweaking the letter that routinely went out to errant taxpayers. After several years and trials comparing reactions to different sample letters, they found that the simple addition of two sentences—“Nine out of ten people in the [United Kingdom] pay their tax on time. You are currently in the very small minority of people who have not paid us yet.”—resulted in the collection of millions of pounds more than the original letter. As news of this triumph spread, BIT quickly became “the talk of the policy town,” according to Luca and Bazerman. Using only a “nudge” to collect substantial amounts of previously unpaid taxes provided a “proof of concept” for the “nudge strategy,” even for the most skeptical politicians.

After this success, BIT expanded its offices to London, Manchester, Singapore, New York, Wellington, and Sydney. It also helped to diffuse the nudge concept and the key role of experiments in decision-making around the world. In 2015, the United States set up a Social and Behavioral Sciences Team within the White House Office of Science and Technology Policy. Australia, Canada, Mexico, Finland, Italy, India, and the World Bank had all set up behavioral insight units by 2018.

The second large-scale behavioral economics victory highlighted by Luca and Bazerman is automatic retirement savings enrollment. Nobel Prize-winning economist Richard Thaler, the primary architect of behavioral economics, and researcher Shlomo Benartzi advanced the idea of companies offering employees the opportunity to enroll automatically in a 401(k) retirement savings plan, rather than opting in by filling out reams of paperwork and choosing from a confusing array of investment options. This change, according to estimates, has resulted in millions of people saving nearly $30 billion for their retirement—although it’s also true that people who were nudged to save more ended up going into more debt as well. In a 2019 interview with author Stephen J. Dubner at Freakonomics, Thaler explained that the initiative was a success because he and Benartzi were able to persuade employers to make retirement plans much simpler, “so the choice architecture is simpler.” But he also acknowledged that this achievement was possible “because the fix was easy. Give me a problem where I can arrange things so that by doing nothing, people make the right choice—that’s an easy problem.”

Indeed, most problems in the real world are more complicated—a point demonstrated by Luca and Bazerman’s example of the work of University of Pennsylvania (UPenn) professors Katherine Milkman and Angela Duckworth. The pair enlisted behavioral economics to compete for a $100 million MacArthur Foundation award for a promising solution to a major social challenge. Their proposed initiative—Behavior Change for Good (BCFG)—aimed to create lasting positive behavioral change. The prestige of the prospective MacArthur award helped them to recruit a team of advisors that included Nobel laureates and MacArthur “genius grant” award winners. Duckworth made her aspirations explicit in a promotional video: “What if we could make meaningful progress on every major problem of the 21st century with a single solution?”

When BCFG did not win the MacArthur prize, UPenn committed several million dollars to the initiative. BCFG’s first project was to increase long-term, lasting participation in exercise, which seemed like a safe bet: There was growing literature on the malleability of Americans’ exercise habits; most Americans were aware that they were not active enough, and most wanted to improve in their habits.

So, with their new partner, 24 Hour Fitness, and their web-based platform, they enrolled 63,000 members of 24 Hour Fitness to participate in a massive RCT that was advertised as a free “really cool behavior-change program designed by a team of brilliant scientists.” The intervention included reminders to go to the gym, text messages with motivational tips, and a variety of recommendations contributed by the team of 27 scientists.

Despite the wealth of research and expertise, the results were deeply disappointing. During the 28 days of intervention, 50 to 75 percent of the enrollees did increase their participation in exercise. But none stuck with it—none made in any change that lasted beyond the 28 days, despite 53 versions of the intervention being tested. Luca and Bazerman quote Duckworth, who bluntly summarized the results: “Behavior changes are really *#$@ing hard.”

It’s bad news that it’s so hard to get people to make behavior changes that require doing something rather than nothing. The worse news is that problems that can be solved by individual behavior change are not the society-wide, systemic ones.

This example exposes the book’s big weakness. Luca and Bazerman emphasize that their findings are credible because behavioral experiments are randomized. However, they do not warn readers that the tools of behavioral economics are severely limited in their application. As Nobel Prize-winning economist Angus Deaton has observed, experiments that are constrained enough to be considered scientifically rigorous are likely to be too narrow to provide useful guidance for large-scale interventions.

The experiments that are lauded in this book are useful primarily to those who would address simple problems with simple solutions.

What is most crucial to understanding the limits of behavioral economics is that its tools are a good fit with one clearly definable subset of problems and solutions, but not others. Behavioral economists Luca and Bazerman would have performed a great public service by making this distinction clear in their book. The experiments that are the province of behavioral economists and that are lauded in this book are useful primarily to those who would address predominantly simple problems with predominantly simple solutions.

As long as 30 years ago, the luminary of child development studies, Harvard University’s Jerome Kagan, pointed out that resources and attention for young children go primarily to the circumscribed, well-defined, and relatively uncontroversial programs that have been shown—at least in experimental circumstances—to improve young children’s prospects by training their mothers to read to, talk to, and play with them more consistently. He noted, however, that improving the quality of housing, education, and health of children living in poverty may be more effective in the long run. Kagan suggests that this choice in resource allocation is likely the case because the latter strategy lacks experimental proof and because “it is considerably more expensive, more contentious, and more disruptive of the status quo.”

The subset of problems that are appropriately addressed by behavioral experiments are those with solutions that focus on individuals, rather than systems. To understand the most powerful factors that determine children’s adult outcomes, you can’t use the tools of behavioral economics that control for and eliminate complexity. You would turn to big data methods that embrace complex, interrelated causality and multilevel, multidirectional, and nested factors over time.

The overarching issue—which Luca and Bazerman neglect—is that “evidence-based” does not have to mean “experimental-based.” The Power of Experiments reflects the failure to distinguish between the kinds of evidence needed to certify effective drugs and the more complex and broader array of evidence needed to guide social policy. The premise that only randomized experiments can certify which medications or vaccines are safe and effective does not mean that randomized behavioral experiments are the best guides to social action.

Measurement & Evaluation

Experimentation and Its Discontents

Create a free SSIR account to access this content.

This article is free.