In today’s rapidly evolving digital world, personal data protection has become a critical issue. With the exponential growth of big data, large amounts of personal information are being collected, stored, and processed, making privacy a major concern for individuals, organizations, and governments alike. To safeguard personal data, it is vital to adopt Privacy Enhancing Technologies (PETs), which are a range of innovative technical solutions designed to enhance privacy and protect personal data. These technologies use encryption and other methods to secure personal information and prevent unauthorized access. As technology continues to advance, so does the potential for privacy technologies to protect personal data in a variety of ways.
A hackathon provided the opportunity to explore privacy enhancing technologies in a real-life scenario
The United Nations’ Privacy Enhancing Technologies Lab[1] and UNHCR, the UN Refugee Agency, hosted a data science hackathon during the 7th International Conference on Big Data and Data Science for Official Statistics, that took place in Yogyakarta, Indonesia, between 11-14 November 2022. The competition was devised to increase awareness of PETs, a range of innovative technical solutions designed to enhance privacy and protect personal data, and their potential to allow data access for tackling important societal questions. To make the competition as close to real-life challenges as possible, UNHCR provided a dataset from its Microdata Library. Participants were asked to analyze survey data collected from refugees during the COVID-19 pandemic in Kenya to understand the main factors contributing to their social and economic vulnerability. The 72-hour long hackathon saw around 300 teams representing national statistical organizations (NSOs), data science start-ups and academic research centers from over 30 countries participating.
During the hackathon, participants used machine learning to estimate unknown values of the dataset and were evaluated according to the accuracy of their predictions. An additional challenge of this hackathon was that the data provided was not complete, so participants were not able to directly view the sensitive variables, but had to interact with them through privacy-preserving methods.
The participants of the 7th International Conference on Big Data and Data Science for Official Statistics in Yogyakarta, Indonesia. © BPS-Statistics Indonesia
Federico Sanson (on the right) participated in a panel at the International Conference on Big Data. © BPS-Statistics Indonesia
Privacy enhancing technologies are vital to safeguard personal data in UNHCR
When sharing data in sensitive humanitarian contexts, it is essential that the privacy of individuals and the confidentiality of their information is preserved. UNHCR carries out surveys and other data collections on a regular basis and makes anonymized datasets available on the Microdata Library, where datasets are curated and can be downloaded by researchers, partners, and other stakeholders.
These datasets may contain personally identifiable information and before sharing them, UNHCR carries out an anonymization process, which uses statistical disclosure control techniques to reduce the risk of reidentification of a single individual. However, in some cases datasets may be too sensitive to share, which means that a lot of the analysis, from which much public good could be derived, is in practice unavailable. However, PETs are a range of novel data-processing techniques, which might make such sharing possible.
The PET used during the hackathon was based on the concept of differential privacy. The idea behind differential privacy is that if the effect of making an arbitrary single substitution in the database is small enough, the query result cannot be used to infer much about any single observation. The goal is to give each observation roughly the same privacy that would result from having their data removed from the original dataset.
As the participants did not receive the full dataset, they could query – or ask – the PET tool for certain information, such as the number of male or female respondents or the mean of their household expenditure. The tool would give an almost correct answer as it will add ‘statistical noise’ for added security. Each interaction with the data had a cost though. The magnitude of this cost was determined by how much noise the participants were willing to have added to their queries; the noisier the query, the cheaper it was.
As the figure on the left shows, the ‘X’ data was available to the participants, while the ‘Y’ data was stored on a different server. The Train Y data could be accessed only through the PETs frameworks provided. The Test Y data could not be accessed at all, and the final goal was to estimate it. Participants had to train a machine learning model using Train X (accessible) and Train Y (only accessible through PETs frameworks). They then had to use Test X (accessible) as input of the machine learning model to calculate Test Y (not accessible at all). Final scores were determined by a trade-off between the accuracy of the predictions and the total cost of all queries a team made.
Privacy enhancing technologies help to create a more secure digital environment
The hackathon’s PETs demonstrated to be effective in delivering useful results while preserving the privacy of the dataset. Additionally, the participants reached a good level of accuracy in their predictions, with the winning team getting to an accuracy of around 80 per cent. Lastly, the participants did not have any difficulty in understanding and working with UNHCR’s dataset, even though they were not necessarily familiar with forced displacement data. This shows the high quality of the datasets available on UNHCR’s Microdata Library and the clarity of their documentation and metadata.
As big data continues to grow in significance and usage in the humanitarian context, the use of privacy enhancing technologies is becoming increasingly important to ensure the security and privacy of personal information. For UNHCR, the insights gained from the hackathon will inform ongoing work to explore how PETs can be used to enable safe sharing of sensitive data, to improve decision-making without compromising individuals’ privacy. Finally, the use of these technologies is crucial for creating a more secure digital environment, particularly for vulnerable populations such as the people we serve.
[1] The United Nations’ Privacy Enhancing Technologies Lab is a collection of national statistics organizations and technology experts collaborating to modernize the way data are shared and statistics are produced.