According to recent United Nations estimates, there are globally about 258 million international migrants, meaning people who live in a country other than the one in which they were born; this represents an increase of 49 percent since 2000. Of those, 26 million people have been forcibly displaced across borders, having migrated either as refugees or asylum seekers. An additional 40 million or so people are internally displaced due to conflict and violence, and millions more are displaced each year because of natural disasters. It is sobering, then, to consider that, according to many observers, global warming is likely to make the situation worse.

Migration flows of all kinds—for work, family reunification, or political or environmental reasons—create a range of both opportunities and challenges for nation states and international actors. But the issues associated with refugees and asylum seekers are particularly complex. Despite the high stakes and increased attention to the issue, our understanding of the full dimensions and root causes of refugee movements remains limited. Refugee flows arise in response to not only push factors like wars and economic insecurity, but also powerful pull factors in recipient countries, including economic opportunities, and perceived goods like greater tolerance and rule of law. In addition, more objectively measurable variables like border barriers, topography, and even the weather, play an important role in determining the number and pattern of refugee flows. These push and pull factors interact in complex and often unpredictable ways. Further complicating matters, some experts argue that push-pull research on migration is dogged by a number of conceptual and methodological limitations.

To mitigate negative impacts and anticipate opportunities arising from high levels of global migration, we need a better understanding of the various factors contributing to the international movement of people and how they work together.

Data—specifically, the widely dispersed data sets that exist across governments, the private sector, and civil society—can help alleviate today’s information shortcoming. Several recent initiatives show the potential of using data to address some of the underlying informational gaps. In particular, there is an important role for a new form of data-driven problem-solving and policymaking—what we call “data collaboratives.” Data collaboratives offer the potential for inter-sectoral collaboration, and for the merging and augmentation of otherwise siloed data sets. While public and private actors are increasingly experimenting with various types of data in a variety of sectors and geographies—including sharing disease data to accelerate disease treatments and leveraging private bus data to improve urban planning—we are only beginning to understand the potential of data collaboration in the context of migration and refugee issues.

Migration as a Data and Information Problem

The IOM Migration Data Portal is an important step toward getting a more comprehensive picture of today’s migration situation worldwide.

In fields such as climate change, politics, and finance, there is growing recognition that some of the most intractable problems of our era are information problems. While the datafication of what we have hitherto considered social, political, or cultural issues poses its own risks (notably to individual privacy), it also offers new scope for insight and analysis—and, ultimately, targeted policy responses. It is our contention that migration and refugee flows are susceptible to the same forms of analysis. Yet unlike many of these other areas, where data generally exist in copious quantities, migration analysis has suffered from a set of data challenges, including (among others):

  • The complexity and number of variables involved with migration, and how they interact in difficult and hard-to-predict ways. As noted, migration is a multi-faceted phenomenon. While the International Organization for Migration’s (IOM’s) Migration Data Portal seeks to provide a one-stop shop for a wide variety of data on migration, the availability and liquidity of national data are still often fragmented and limited, particularly across the Global South. Making informed decisions requires that we pull data from various sources, across geographies and sectors.
  • Refugees often come from countries with weak institutions and limited official statistical capacity. Many of these countries are often wracked by conflict or unrest, a fact that further limits the ability to collect or rely on official data. In addition, collecting data on refugees from source countries could also be harmful, given that the same countries may persecute them. All of this makes it difficult to make informed, evidence-based decisions about migration.
  • In addition, there are significant migratory movements across economically disadvantaged and technologically underdeveloped countries, which data-gathering systems do not capture due to limited statistical capacities. This also limits the availability of reliable information, particularly electronic information.
  • For all the reasons outlined above, migration-related data is often poor quality and fragmented. Data that does exist often contains errors or is out of date. In addition, it is not disaggregated by important socio-economic variables like sex. Usually, too, systems collect and store it in a fragmented manner, raising issues related to interoperability and limiting the ability of policymakers to pull together the broad, diverse information necessary for making migration-related decisions.
  • Finally, data related to forced migration in particular is often limited by the wariness refugees and asylum seekers harbour toward officials and government institutions. Refugees, who may be fleeing a hostile or repressive government, are often resistant to entering official data channels. Likewise, they may fear law-enforcement agencies, and thus be unwilling to cooperate and share information. While such problems are not insurmountable, they do suggest the need for aid agencies and policymakers to approach the problem of data shortages with sensitivity and with a responsibility not to increase the burden of suffering on migratory populations.

Efforts in Overcoming Migration Data Shortages

These data challenges associated with migration are real and limit the ability of policymakers and civil society to make effective, targeted responses. Nonetheless, recent years have seen growing efforts on the part of governments, aid agencies, and others to collect and make better use of data. A number of international agencies, and public sector and civil society actors—including the IOM, World Bank, Internal Displacement Monitoring Center (IDMC), United Nations High Commissioner for Refugees (UNHCR), and Organization for Economic Cooperation and Development (OECD)—have been working to collect data on and analyze drivers of migration and other indicators of migratory patterns. Worthwhile initiatives that have recently emerged include:

  • IOM's Migration Data Portal, produced by the Global Migration Data Analysis Centre (GMDAC), seeks to serve as an “access point to timely, comprehensive migration statistics and reliable information about migration data globally.”
  • IOM’s Displacement Tracking Matrix (DTM) is a system for tracking and sharing information on displaced communities and related emergency situations. It allows agencies to infer migration patterns via locatable mobile phone call records, IP addresses, or geotagged social media activity drawn from private-sector data sources. (This is a good example of the data collaborative we discuss in the next section.)
  • Jetson, a project launched by UNHCR’s Innovation Service in 2017, seeks to leverage data and data science to predict movements of displaced people in Sub-Saharan Africa, particularly in the Horn of Africa.
  • UNHCR and UN Global Pulse are also collaborating on a project testing the potential value of social media analytics to examine the attitudes and opinions among European citizens about migrants and refugees in the wake of several terrorist attacks across Europe, using data drawn from Crimson Hexagon’s social media monitoring tool.
  • UNHCR and the World Bank recently announced the creation of a joint data centre to improve global statistics on forced displacement, which aims to make predictions about potential displacement through data mining, AI, and machine learning.

These examples suggest that there are ways to remedy, or at least mitigate, at least some of the informational challenges that underlie societal problems related to migrant flows. One lesson emerging from all these efforts is that, in many cases, at least, the issue is not so much that data doesn’t exist, but that it exists in dispersed, proprietary, incompatible, or otherwise hard-to-access forms. This is where the role of data collaboratives come in—to potentially complement or support the above efforts.

Four Value Propositions of Data Collaboratives

Data collaboratives refer to an emergent form of public-private partnership that allows for collaboration around new data sources across sectors and geographies. While there is no single way of leveraging new data sources, we have identified four ways data collaboratives can help address the migration data and policy challenge (distilled from our own research and the insights shared during a workshop on big data and migration):

1. Improve situational analysis related to migrant and refugee movements. In various countries around the world, governments and agencies are using satellite imagery and social media data, along with other forms of data, to improve institutional awareness and response to shifting migration patterns. For instance, the satellite imagery company DigitalGlobe entered recently into a data collaborative arrangement with UNHCR to provide “timely, accurate information” related to Sudanese refugees, especially those entering into a large camp across the border in Ethiopia.

Similarly, the Laboratory of Mobility Studies at the University of Tartu conducted a study on transnationals from Estonia using mobile positioning data and other GIS data sources to gain a better understanding of people’s spatial migration from Estonia. The lab also collaborated with Eurostat on a similar initiative aimed at leveraging mobile positioning data to gain a better grasp of tourist movement in the country.

The Qatar Computing Research Institute (QCRI), in collaboration with the University of Washington (UW), is leveraging advertising data drawn from social media platforms to gain a deeper understanding of migration and migrant integration. Their Studying Migrant Assimilation Through Facebook Interests project examines these issues by quantifying the assimilation of Arab migrants (mostly Syrian refugees) in Germany, as demonstrated by demographic data provided by Facebook to advertisers. QCRI and UW have a similar initiative using data from LinkedIn’s advertising platform to analyze the migration of highly skilled individuals. QCRI's work on data from social media entities, specifically social media advertising information, has demonstrated the potential value of leveraging non-traditional data sources to gain a deeper understanding of human behavior and movements, including but not limited to migration.

Meanwhile the Governance Lab, together with UNICEF and other partners, with support from the World Bank, is creating a data collaborative focused on internally displaced people (IDP) in Somalia. The initiative is leveraging satellite imagery data and other private-sector datasets to bolster our situational analysis of where and how displaced people, particularly children, move about the country. The project’s hypothesis is that by addressing the IDP information challenge in Somalia, humanitarian agencies can more effectively and efficiently target aid and resource distribution.

2. Generate new knowledge on drivers of migration, and enable the transfer of existing knowledge across sectors. Researchers around the world have increasingly used web data, including social media data and search query data, to gain a better understanding of the drivers of human behavior. Efforts include social media analysis to help UNICEF understand opinions and behavior related to anti-vaccine activity, and search query analysis to uncover the root causes of suicide in Korea. The sharing and analysis of private-sector web data can create new areas of knowledge that humanitarian organizations and other institutions can put into action.

The Data Challenge on Integration of Migrants in Cities is a European Commission (Joint Research Center) initiative seeking insight into the integration and concentration of migrants in cities. It provides access to high spatial resolution census data that shows the concentration of migrants in cities across eight European Union States, and encourages participating teams to examine patterns and develop new insights that can aid in the integration and understanding of migrant communities across the EU. Project proposals arising from the challenge include an initiative aimed at identifying the drivers of successful social and economic integration of migrants, and developing research-based solutions for immigration policies. Another project is leveraging data and information from labor surveys and Gallup’s World Poll to create descriptive analyses of local labor market changes that have arisen as a result of migration.

3. Inform prediction and forecasting related to migration and refugees. A June 2017 study from the Pew Research Center showed how different search patterns in certain regions—such as Arabic-language searches arising from Turkey, including the keyword “Greece”—correlated with changes in migration flows to Europe. A similar, previous analysis leveraged geo-located Yahoo! search query data to better understand and predict migration flows, especially focusing on the “pendularity”—back-and-forth movements—of migrants. A more systematized (and responsible) approach to cross-sector data sharing and analysis, including but not limited to search query data, could provide these types of predictive insights, and inform more anticipatory and effective responses to shifts in migration.

Other efforts are underway to model migration flows. For example, the University of Southampton, in collaboration with Flowminder, is leveraging anonymized census micro-data to map and model global migration flows, particularly in low- and middle-income countries; the goal is to improve global resilience to infectious diseases. A similar project is using census-derived data and other information to examine human migration in malaria endemic countries. These and other recent works have shown that census-derived migration data can provide useful insights into the movement of populations when combined with flow estimates or other data such as satellite imagery within and across borders. In addition, the data can provide a significant degree of granularity, for instance providing insight into the relative size of movement between administrative units and across temporal scales.

4. Enable more targeted impact assessment and evaluation of migration interventions and responses. The SoBigData Exploratory migration studies project, funded by European Union's Horizon 2020 research and innovation program, is using data collaboration to answer questions related to evaluating migration policy in Europe and migration more generally. With a greater focus on individual experiences, Demal Te Niew project draws on diverse datasets and data journalism to gain insight into the experience of migrants returning to Senegal from Italy, and how they are affected by diverse migration policies and interventions.

Moving Forward: Stewardship and Responsibility

The public value of cross-sector data collaboration in the realm of migration is starting to become apparent in diverse and ongoing cross-sector data-sharing experiments. Needless to say, much remains to be done. Access to more migration-related data could help governments create a strategy and the necessary infrastructure to handle both the challenges and opportunities from an influx of migrants. It could similarly help humanitarian organizations create more-targeted programs. However, in order for such initiatives to succeed, stakeholders need more than just more data; they also need an effective framework within which to collect, process, analyze, share, and act on the data.

As it stands, decision-makers on both the supply and demand sides of data collaboratives lack a clear and actionable understanding of if, when, and how to establish new partnerships around data exchange, and, importantly, how to do so in a responsible manner that does not create privacy and other risks. Without a clear understanding of both the risks and rewards of partnering around data across sectors, data holders are likely to remain risk averse and restrict the flow of data, thus minimizing the positive secondary usage and societal impacts of data.

We suggest three necessary steps to transition from a series of innovative yet ad hoc data collaborative projects to a broader framework of action. Together, these three steps provide the foundations that could allow all stakeholders—in government, civil society, and the private sector—to design more targeted, effective policy interventions.

First, map and document data collaboratives in the migration space. Companies, governments, and users need better proof-of-practice to understand the value and potential impact of data collaboratives. The mapping and documenting of existing initiatives in a structured manner can highlight what works in forming these new partnerships — their value propositions, technical arrangements, legal frameworks — and strategies for measuring impact. Such evidence may also enable the creation of a decision tree to determine what kind of data collaborative is most appropriate for different value propositions and/or policy questions.

Second, identify and nurture “data stewards” and connect them through a network. As with data collaboration in general, some leading innovators are already establishing concrete roles, responsibilities, and practices for determining if, when, and how to share corporate data for the public good. To professionalize the private-sector supply side of migration-focused data collaboratives and increase the resilience of such efforts, we recommend two related actions:

  • Increasing knowledge-sharing and networking among existing data stewards regarding emerging methods and tools to share data for good.]
  • Better articulating specific problems, needs, and data gaps (by the government, international organizations, and NGOs that represent the demand side of data collaboratives) to help the private sector better understand when and how their data could make a difference.

Finally, develop data responsibility frameworks for collaborating on and sharing data. As described above, data collaboratives, especially those involving vulnerable communities, are not free of risk. To help both the supply and demand side of data collaboratives more effectively navigate these risks, the research, policy, and technology communities should develop new methodologies, tools, and frameworks to enable the responsible and systematic sharing of data. Building on emerging data responsibility frameworks, operational guidance for data responsibility could decrease the transaction costs, time, and energy currently needed to establish data collaboratives between the private and public sectors, and do so in a way that does not create undue risks to the intended beneficiaries of these public-private data-sharing arrangements.