Teaching a ‘robot’ to detect xenophobia online

A robot? Not exactly.

Machine learning (ML) and Artificial Intelligence (AI) are two buzzwords, particularly when talking about the realm of data innovation. Artificial Intelligence is the ability that machines have to mimic the cognitive processes of humans. The word ‘artificial’ comes from this idea that machines are not intelligent per se. Behind them, there are humans programming them to perform certain tasks. Nevertheless, depending on the complexity of their programming, some machines are more ‘intelligent’ than others. This means that some machines only need to be programmed once and they will continue to perform the tasks or increase the complexity of the task performed, on their own. For data enthusiasts and innovators working in the humanitarian sector, AI expands the possibilities of processing data in a more accurate and timely way — data that could help Senior Management make decisions quicker or prepare our teams on the ground better for eventual contingencies.

According to TechTarget, a robot is a machine designed to execute one or more tasks automatically with speed and precision. Some robots, for example, only need simple programming to do specific repetitive tasks, and sometimes they do not necessarily require AI embedded in them. This is the case of a robot in an assembly line. However, not all AI is necessarily applied into a robot. For example, sometimes AI is applied in a computer or a mobile device. And sometimes — once AI is programmed — it has the ability to ‘learn’ from the original programming and then compute tasks on its own. An example of this is Siri on your iPhone. Siri, is a form of applied AI that is capable of ‘learning’ voice patterns and convert them into dictation. It recognises a language, a local accent to then, perform a task — like looking for the weather conditions in a particular city. Siri synthesises millions of data points coming from different words, languages, and even different accents around the world, becoming ‘more intelligent’ and recognising more patterns every time. Siri uses then Machine Learning (ML) techniques to process all this amount of data, and responds in a matter of seconds — even if the same question is asked in different ways with a different tone — how’s the weather today? Is it going to rain? Is it cold? To compute an answer: bring an umbrella.

Applications of machine learning

In the world of marketing, machine learning has been used to process large amounts of information to make decisions on how to design new products and improve services for customers. However, in the humanitarian sector, AI applications are a new area for exploration. AI and ML can allow humanitarians, innovators, and data specialists to compile, process, and visualise huge amount of data in a matter of seconds. Many humanitarian emergencies are complex and first-responders often only have partial information to act quickly. To have a full picture of a complex situation, many various pieces and elements should be analysed. Sadly, humans do not have the time nor the resources to compile all the different information in the short timeframe needed to respond. Every so often decisions are made with partial evidence to act quickly and save lives. And this is precisely where machines can help.

For example, currently, UNHCR staff and partners spend time, money, and human resources in analysing from different angles and perspectives the issue of local integration: socially, economically, legally, and culturally. This is done to respond to the questions related to appropriateness and feasibility of integration of UNHCR’s persons of concern into local communities.

Big Data: challenges and opportunities in the humanitarian context

Depending on the context and in order to have a full picture of a specific situation, humanitarians frequently use proxies: data points that are not by themselves directly relevant, but that provide sampled insights of some issues that are completely unknown to them. Often these insights are found in traditional forms of data: secondary data, census information, surveys, focus groups discussions notes, interview recordings, household visits or key informant interviews. However, additional insights can also be found in other forms of data, the non-traditional datasets: radio broadcasts, earth observations and geospatial data, call centre/call data records, remote sensing, wearables, downloads, news outlets, and social media — just to mention a few.

The amount of data produced by these non-traditional data sources is huge and usually ‘heavy’ in terms of: 1) data storage, occupying large disks/server space (volume); 2) produced in short intervals — often even produced at seconds intervals (velocity); 3) comes in different formats, like voice recordings or free text (variety) and often; 4) the information is produced from one single — and occasionally biased perspective/angle (verification). This is the reason why these non-traditional data sources are also known as big data sources — with the four “Vs” which are the primary attributes of big data.

For example, in social media, Twitter produces an enormous amount of data in a matter of seconds. It is calculated that approximately 200 billion tweets are produced in a year (6,000 tweets per second). The amount of energy and time that our UNHCR colleagues, particularly our communication colleagues, would need to have to collect, compile and analyse and visualise results to respond to specific questions — would be a challenge to their already burdensome work. Some of them have done it manually, through compiling meaningful insights. Compiling social media data is important to humanitarian organisations, like UNHCR, to understand persons of concern most urgent needs and to establish a two-way communication with them. But to scale-up this process, and most importantly, to be able to quantify it with a certain degree of statistical significance, humanitarians can rely on machines: to sample, compile, and catalogue data in real-time.

Training a machine to detect xenophobia

In 2015, the UNHCR Innovation Service partnered with UN Global Pulse, the United Nations initiative for big data analytics, to find additional insights into a rapidly-evolving setting: the Mediterranean situation. Originally intended to analyse intentions for predicting movements, the teams turned to Twitter data to identify patterns that could help provide insights into cross-border movements. The teams used machine-learning to “find”, “read”, “compile”, and “catalogue” tweets found in specific geographical locations and particular languages (e.g. Arabic, Farsi, English, French, Greek, German) attempting to find movement intentions or comments on services provision that would incentivise their movement. Although some comments were relevant, the sample of tweets found was not enough to provide sound mathematical-based evidence.

However, the machine found anomalies of comments that were particularly exacerbated during the terrorist incidents in Europe. Every time a new incident happened — Munich, Paris, Berlin to name some of the key events — posts with a negative sentiment towards refugees appeared in different parts of the world. Sometimes these posts even had a negative association with refugees with the incidents. The teams then re-trained the machine with a human rights-based bias: to find comments that will trigger intense dislike or hatred against people that are perceived as outsiders, strangers or foreigners to a group, community or nation, based on their presumed or real descent, national, ethnic or social origin, race, color, religion, gender, sexual orientation or other grounds. Manifestations of xenophobia include acts of direct discrimination, hostility or violence and incitement to hatred. Xenophobic acts are intentional as the goal is to humiliate, denigrate and/or hurt the person(s) and the “associated” group of people (OHCHR). The team ‘taught’ a machine to ‘learn’ how to read, compile, categorise, anonymise, and aggregate different types of Twitter posts, in different languages and across cities and to quantify both xenophobia and integration-friendly comments.

We drafted a White Paper titled, “Social Media and Forced Displacement: Big Data Analytics & Machine-Learning,” to share the process and quantitative results of experimenting with machine-learning for understanding the dimension of the sentiment in the region. The conclusions of the paper can serve as insights of one single data source (Twitter) just as one single piece of the puzzle on what host communities think about persons of concern — like refugees — arriving into their countries. It could be used as evidence for humanitarian organisations for preparing an advocacy campaign or drafting policy recommendations to better counter xenophobia. For UNHCR teams, it could serve them to direct their community-based protection initiatives by understanding the main issues that refugees encounter when arriving into a new country.

The promise of machine learning: more questions than answers

By using machine-learning, both teams had a snapshot of evidence on questions related to integration for just one single region. However, in data science — where data is king — data insights produce always, more questions. After analysing some of the results of the experiment, the teams reflected on the following questions: A) how can we use AI for advocacy purposes in other regions? B) how can we help other agencies and organisations to use these tools in order to understand complex contexts where social media is not prevalent, or there is no electricity/connectivity? Also, when more walls are going up, C) how can we leverage AI to analyse big data and create a counter-narrative for hate speech? And finally, D) how can we translate integration and counter xenophobia in a digital world? If you have an answer to any of these questions or would like to experiment with us to respond to them, feel free to reach us. We have some ‘robots’ that could help with some of the tasks.

This essay was originally posted in the recently released report: UNHCR Innovation Service: Year in Review 2017. This report highlights and showcases some of the innovative approaches the organization is taking to address complex refugee challenges and discover new opportunities. You can view the full Year in Review microsite and download the publication here.