by Andrea Pellandra and Lauren Herby, UNHCR’s Global Data Service Data Curation Team
In order to convey its prominence as one of the world’s most valuable resources, data has been described as “the new oil”. The exponential growth in digital means of data collection, larger computational capacity to process huge datasets and the falling cost of digital storage have created new opportunities for UNHCR, the UN Refugee Agency, to use data to improve the way we protect, assist and provide solutions to those who have been forced to flee.
In its newly released Data Transformation Strategy, UNHCR has made the commitment to “ensure that quality and coherent data related to refugees and other persons of concern is systematically, responsibly and efficiently managed by UNHCR and its partners, and shared openly and responsibly both internally and externally”. For UNHCR to meet this commitment and for any analysis to lead to meaningful insight, there must first and foremost be trust that the data at its core is collected according to rigorous standards and methodologies and is fit to serve its intended purpose. Afterwards, the real potential of data lies in its analysis and use, and in how it is transformed into reliable and relevant information to support evidence-based action.
Increasing access to data
UNHCR collects individual and household-level microdata to inform its programming and fulfill its mandate. This includes, for example, data on affected populations’ needs, their socioeconomic situation, and their vulnerability status. The same data may be anonymized and shared with a wider group of actors for other purposes, such as analysis and research. By increasing the opportunities for analysis, opening up data will deepen our knowledge, increase the evidence base to inform our decision-making, and, in turn, exponentially increase the value of the data overtime.
UNHCR has made the commitment to “ensure that quality and coherent data related to refugees and other persons of concern is systematically, responsibly and efficiently managed by UNHCR and its partners, and shared openly and responsibly both internally and externally”
Different approaches to sharing and use are applied to different types of data in order to assist individuals and ensure their protection. While “public use” allows for data to be freely accessed, modified and re-shared, “licensed use” regulates access and permits limited use and re-sharing. The first approach is usually applied to aggregate data, ie. data – such as population data – that has been combined along one or more criteria and does not contain information about specific individuals. The second approach is usually followed for anonymized microdata, unit-level information processed in a way that removes personally identifiable information to reduce the risk of re-identification of the individuals to whom it relates.
UNHCR is demonstrating its commitment to open data through efforts to revamp or develop new and existing data sharing and joint analysis platforms, such as the Operational Data Portal which disseminates information, reports and key aggregate figures on operational data in order to support coordination with humanitarian partners. The new version of the Refugee Population Statistics Database, which contains aggregate time-series population data since 1951, was recently launched, and now includes a mobile application that will help make the data more accessible to a larger number of practitioners.
Introducing UNHCR’s Microdata Library
Data that UNHCR has shared openly and publicly to date has mainly been macro-level or aggregated data. Access to identifiable microdata has so far been ad-hoc or regulated by data sharing agreements with partners, as well as some research institutions. However, UNHCR is now moving open data in a much bigger direction.
In 2016, the World Bank approached the UNHCR regional office in Jordan to request access to its data for an economic and social analysis of displaced Syrians. The regional office saw the added value and was keen to share. There was, however, no mechanism to efficiently and securely facilitate the process. As a result, the Development Data Group at the World Bank assisted UNHCR in developing its capacity and mechanism to effectively prepare and anonymize data and share it externally. This included the development of the Microdata Library (MDL), UNHCR’s latest data catalog launched in January 2020.
The MDL is a platform for UNHCR to make anonymized microdata available to organizations, partners, or individual researchers who demonstrate a legitimate need to access it. It adds to a growing number of microdata libraries including those from the World Bank, International Labour Organization (ILO) and the Food and Agricultural Organization (FAO).
The MDL is a platform for UNHCR to make anonymized microdata available to organizations, partners, or individual researchers who demonstrate a legitimate need to access it.
Thanks to the financial support of the newly launched World Bank – UNHCR Joint Data Center on Forced Displacement, UNHCR’s commitment to open data is now firmly supported by a Data Curation team with six staff covering all seven UNHCR regions. They act as focal points to discover and obtain data, are trained to perform data cleaning, documentation, and statistical disclosure control (SDC), and tasked to manage requests for data access.
Secure access to microdata
The successful implementation of the MDL requires a global team effort. Firstly, SDC is a balance between removing risk and retaining the utility of data. While special software packages are available to facilitate the work, data analysis still requires human insight from both trained data curators familiar with SDC and those with a substantive knowledge of the data such as the teams who collected it – the data providers. Data curators and data providers work together to answer questions about variables too sensitive to share, key variables, and rare observations that may pose a high risk of re-identification in the dataset. Secondly, the release of the data requires permission from the UNHCR data controller and risk owner (normally country teams) as well as advice from the data protection officer regarding compliance with UNHCR’s Data Protection Policy.
It is hoped that in the future all individual requests for microdata will be channeled to the MDL, reducing the burden on the operations of receiving these requests. In return, we may see more global research using UNHCR’s microdata, such as, for instance, the cash-transfer impact study undertaken by the American University of Beirut.
This week, the MDL is publishing 32 new datasets, and now contains anonymized microdata from 59 surveys with additional ones to be released quarterly. Dataset contributions to the MDL have come from UNHCR’s country operations, regional offices, and thematic sections, including health, shelter, education and livelihoods, to name a few. In addition to UNHCR’s data, the MDL includes data from UNHCR’s partners or from data collection exercises carried out jointly between UNHCR and other organizations.