From tweets to intelligence: using AI to combat propaganda and disinformation

Dec 10, 2022

In today’s internet age, everyone is a content creator. But as an untold amount of false and misleading content makes its way online alongside that which is truthful, fact-checkers are turning to technology to keep up.

The current crisis of online disinformation stems, in large part, from a lack of transparency, accountability and democratic control over decisions about how our online spaces of communication are moderated. Globally, this issue is rooted in their centralised control by large tech conglomerates, whose advertising-based business models incentivise them to build algorithms that “prioritise sensational, eye-catching and controversial content.” But when misleading and harmful content content is posted in languages other than English, the problem grows. Social media platforms’ content moderation in Arabic is plagued by a poor understanding of local context and algorithmic tools that are ill-fit for the Arabic language and its numerous, rich dialects.

With roughly seven in ten Arabs consuming their news online each day, there is thus a clear crisis of truth in the region. While much of this is benign, taking the form of information pollution and misinformation, other types of malinformation and disinformation have led to human rights abuses, contributed to a range of hate crimes and made online spaces less safe for women, among other deleterious effects.

In response to this growing problem, a cohort of initiatives across the region have cropped up that are dedicated to identifying online disinformation and furthering public awareness around how to tackle it. Across the Arab world, there are currently 45 active fact-checking organisations working to build a healthier, safer media environment. But the huge operational constraints they face, and the colossal amount of disinformation online makes their work an almost Sisyphean task. Here, digital tools to scan and process large volumes of text can take dents out of the fact-checking workload, freeing up vital human resources to focus on important claims that need checking.

Outside the region, the power of Artificial Intelligence (AI) is being harnessed to better arm fact-checkers in their fight against disinformation. Using methods like Natural Language Processing (NLP) and Machine Learning, technologists ‘train’ software to recognise semantic structures and other patterns associated with propaganda, misinformation and disinformation. In this regard, English speaking media environments have a significant head start. The majority of applications of AI and NLP have been developed for use with the English language, meaning that models are already well adapted to understanding English’s specific linguistic nuances. Far fewer models have been developed for non-Western languages, including Arabic, which is known for its linguistic complexity and diverse dialects.

The team behind Dalil is seeking to rectify this by bringing the power of AI to MENA fact-checkers. While in practice this is an extremely technically complex and nuanced process, it can be boiled down to teaching Arabic to an existing NLP technique.

AraBERT, the NLP technique developed by the Dalil team, is based on the BERT model created by Google in 2018. The BERT model, which is today how all queries using Google’s search engines are processed, triggered a revolution in NLP. After being trained on a relatively small text corpus, BERT gives Google’s search engines the ability to not just understand the meaning of individual words, but the intent behind a user’s search.

This requires understanding the relationship between words and phrases and how these are used. For online disinformation, where it is not the words themselves that evidence falsehoods but rather how these words are arranged, and the context in which they are situated, BERT is thus a very promising model.

With AraBERT, the Dalil team has sought to apply the BERT model to online disinformation in Arabic by training it on a dataset of Arabic tweets containing disinformation. Most recently, as part of a competition on AI-driven propaganda detection in Arabic, this included a dataset of 504 tweets labelled based on their use of different propaganda techniques, such as loaded language, name calling and exaggeration.

While AraBERT showed promising initial results, the model requires more training using a larger dataset to be able to better generalise and identify online disinformation.

“The lack of research into Arabic machine learning models is, to say the least, concerning; however, our implemented model is cutting-edge and does a good job of capturing the data's complexity,” Elie Badine, Dalil’s lead AI engineer, said. “The optimisation approach we will now take involves collecting and manually labelling more Arabic language propaganda-related tweets, which will give us a more representative data set for tuning the AI model.”

A crucial component of this task of training AraBERT to better understand disinformation online falls with Dalil’s fact-checking partners in the MENA region. By using the platform for their fact checking workflow, including monitoring the news and selecting and verifying news items to be checked, fact checkers themselves create the dataset needed to train AraBERT. Through this process, the model will thus learn over time what makes a particular news item ‘worth’ checking and what patterns are shared by news items assessed as false.

While AraBERT will never replace the fact checkers themselves, with sufficient training the model will be able to fill a hitherto absent and essential function by helping fact-checkers sift through the seemingly infinite online content and more easily separate fact from fiction.

For the MENA region, which has been neglected for far too long in the fight against disinformation, Dalil is a chance for technologists and fact-checkers to band together and turn the tide against the tsunami of false, misleading and harmful content fanning the flames of distrust around the world.