Sage Bionetworks has pioneered the use of mobile apps to collect more detailed and informative data from participants. (Photograph courtesy of Sage Bionetworks)
The sequence of the human genome, completed in 2001, was supposed to quickly reveal the secrets of health and disease. Instead, it showed that human bodies are more complicated than anyone realized. Disease is usually caused not by one bad gene, but by subtle variations in dozens or hundreds of genes working with and against each other in vast networks.
This discovery delivered a reality check to genome scientist Eric Schadt. Pharmaceutical giant Merck had spent hundreds of millions of dollars on an effort to create a massive data set of human gene activity, led by Schadt’s colleague and former boss Steven Friend, but both could now see that it fell woefully short.
To deliver on the full promise of genomics, scientists would need not just large amounts of data but unprecedented amounts—perhaps 1,000 times more than what even the richest drug company could collect. In addition, they would need new methods, infrastructure, and hires to sift through it all.
“These diseases are too complicated, the amounts of data too vast, for any one company to generate,” Schadt says. “No single company has the right number of smart people to figure it all out.”
This new era of biomedical big data and machine learning would require a new culture of research. Scientists would need to collaborate on a new scale and to work more closely with patients, the fountainhead of all this rich information.
But this was not—and still is not—how biomedical research gets done. Academic researchers are like sole proprietors, building their own teams, raising their own money, recognition. Biotech and pharmaceutical companies, too, jealously guard their secrets, for fear of being beaten in the market.
Friend and Schadt saw the need for radical change. It was time to introduce the ideas of the open-source movement, which had been so powerful in software development, into biomedical science. Research needed a new kind of organization to raise the banner for collaboration and open science. To earn the trust of scientists, the organization had to be a neutral convener with no profit motive and no intellectual property to protect.
So, in 2009 the pair launched Sage Bionetworks, a tiny 501(c)3 with the mission to transform biomedical research. The nonprofit began by building practical tools for open science, such as databases and platforms for scientists to share, analyze, and discuss big data. But the human side of collaboration needed to be rethought, too, with new rules, practices, and credit-apportioning systems to encourage sharing.
Their revolutionary model was overdue, judging from community interest. Sage became a magnet for researchers who were frustrated by the slow pace and backward incentives of traditional science, and attracted by the chance to experiment with new ways of doing research and seed a new culture for the future. Although the biomedical field as a whole is slow to change, there is no longer any doubt about the effectiveness of open science.
“Sage is leading the way,” says Nancy Barrand, senior advisor for program development at the Robert Wood Johnson Foundation (RWJF). “It’s not so much what they need to do. It’s that the rest of the field needs to catch up to them.”
Winning Consent
The premier challenge for Sage was to convince researchers that sharing data and collaboration offered a superior model. Its first step was to build the data commons, the pool that could win over new collaborators.
At that point, the database that Friend and Schadt had built at Merck was one of the biggest sources of information charting how human genes switch on and off, so it was an attractive lure. Merck allowed them to make it public, calculating that the likely value created by people contributing additional data and working together would ultimately be greater than what would be generated by keeping the data private. Other qualified scientists could access it, but they would need to pay to play, by contributing their own data.
Providing seed data was just the start; the process needed to work better, too. They wanted to make Sage a demonstration zone—a practical laboratory for methods in open science. “You can tell people to share data, but you have to put together the standards and methods so that the data is usable,” Sage President Lara Mangravite says.
For scientists to trust shared data, each collaborator needs to understand where it came from, what its history was, how it was collected, what software generated the models behind it, and so on—information that is often missing or disorganized. Early on, Sage built a software and informatics platform called Synapse to store this essential metadata, make it accessible, and record information about it.
With this promise of a strong, capable informatics platform, they put out the word to scientists. They soon got involved in a problem in colon cancer research. Six major groups around the world were using genetic information to subtype tumors—but not working together on the problem. In 2014, four papers were published that each defined a different set of subtypes, which could not be compared against one another or combined. The field “came to a screeching halt,” says Brian Bot, principal scientist at Sage. “There was no consensus on what those subtypes were, nor what to do with them.”
In 2015, Sage began helping the groups to pool their data, run each of the six teams’ interpretive systems on all the data together, and come to agreement. It worked, and established a consensus schema for other colon cancer researchers to use. “This is the picture postcard of what we think is possible when people are willing to work cooperatively,” Mangravite says.
Partnerships such as this one demonstrated that encouraging open science would be both a technological and a cultural challenge. “What we learned along the way is that just making a data resource doesn’t change science,” says John Wilbanks, chief commons officer at Sage. “You’re confronted with things like collaboration, agency, and ethics. Each time you try to build the commons, one of those things stops you.”
Wilbanks, who had previously been involved in open science through the Creative Commons system, joined Sage staff in 2012 as the organization began ramping up work on the social engineering of open science—setting up systems to nurture trust and shift incentives away from data hoarding. “We had to develop collaborative tools for scientists who don’t work in the same place to work together,” says Wilbanks. “Scientists aren’t trained to do that.”
Large-scale collaboration also necessitates new policies, such as ways to guarantee the privacy of study participants, including terms of use that clarify how researchers can and cannot use information. They laid out methods for researchers to be preapproved for data access, rather than go through a timeconsuming request process every time some new tranche of data becomes available. About 70 percent of people want their data to be broadly used in research, but the paperwork for most studies is not set up to allow sharing.
To solve this problem, Wilbanks developed a new informed-consent protocol that gives research participants more information and more power over how their data is shared. Sage’s version is freely available, and dozens of studies now use it, Wilbanks says, including the federal precision medicine project All of Us, which aims to collect comprehensive data on a million or more people.
“The infrastructure around governance, and legal issues around informed consent, all of that they really helped model for the field of open research,” says Paul Tarini, senior program officer at the RWJF.
Sage also became involved in hosting DREAM challenges: open-science competitions in which groups vie to build the best interpretive algorithm for a data set, competing for the prize of publication in a highprofile journal such as Nature Biotechnology. The competitions, launched by IBM researcher Gustavo Stolovitzky in 2006, are another way to demonstrate the potential power of open data.
The Future of Sharing
Sage has now hosted or facilitated several dozen large cooperative research projects, mostly in cancer but increasingly also in neurodegenerative diseases such as Alzheimer’s. One major effort in cancer, which involved eight high-profile research institutes, recently published its first analysis of genomic data from 19,000 patients, all of which is available for researchers.
The organization also recently launched its first research study, mPower, exploring how smartphones can be used for science. For the study, people with Parkinson’s disease do routine movement tests before and after their daily drugs. Among other things, the smartphone’s accelerometer measures how their movements change, and its microphone captures changes in voice. Because people sign up and participate through the app, rather than going to a clinic, the study can get granular information from a diverse population, which is difficult to do in traditional studies. More than 10,000 signed up. Sage recently made the first tranche of data public, available to qualified researchers. As of June, more than 100 had applied to use it. “It’s a great result,” Tarini says.
The mPower study was one of the first to use ResearchKit, the Apple platform that enables biomedical researchers to build apps; Sage developed tools for ResearchKit, such as a secure way to transfer data off of a device, store it, and anonymize it. Cofounder Friend recently left the nonprofit to join Apple, but he remains Sage’s chair of the board of directors.
More broadly, it is hard to say whether Sage has managed to shift the culture of research. Even within the organization, not every project is entirely open science. Some data releases are delayed; with a few partners, it is never made public. This is not ideal, Mangravite says, but often there is another payoff, such as support to develop a new open-source tool.
But signs suggest that its vision of open science may be catching on. Earlier this year, for example, the Bill & Melinda Gates Foundation announced that the groups it funds must make their data open. Powerhouses such as agencies of the US National Institutes of Health now require collaboration in some projects. And All of Us plans to make its data widely available for scientists—another sign of increasing popularity of open research.
Most important, there is growing evidence that open science can be an effective way to work. “The hardest part was convincing individuals that they’re going to benefit, and it’s good for science,” says Thomas Hudson, head of translational oncology at AbbVie and former leader of several large collaborative biomedical projects. “For the most part, people are realizing it.”
Read more stories by Kat McGowan.
