Fake News Detection – Control Over Disinformation

It is often difficult to judge whether an article we read online is based on facts or not, especially when it comes to polarizing current events. Because it is impossible to manually verify every news story that is published, much research has been done into automatic detection of misinformation. But can we trust the algorithms that are utilized for that? Suzana Bašić, Marcio Fuckner, Pascal Wiggers and others investigated this question in the Explainable Fake News Detection project at the University of Applied Science (HvA). Their research

First, the team of researchers looked at the data used to train computer models for misinformation detection. Since training and running complex algorithms require a lot of data, the datasets are often created automatically. This can lead to certain biases in the data and, in turn, also in the predictive models. For example, the research team has discovered that the models often learn which news sources can or cannot be trusted instead of learning whether an individual article contains incorrect information.

When they removed this source of bias from the data, they found that a very simple algorithm produced results comparable to those of a very sophisticated algorithm. That’s an interesting finding for several reasons: Firstly, simpler models consume fewer resources, making them more sustainable. Secondly, they require less data, allowing us to build smaller but better-quality data sets. Finally, unlike more complex algorithms, they are often explainable and transparent. This means that they can explain why a model made a certain decision in each case. Finally, they conducted experiments to investigate the results of SHAP, a popular explanation method used to explain “black box” algorithms. These are complex algorithms such as neural networks, where it is not entirely clear why they make certain decisions. They identified several problems with this method when applied to text examples.

In the figure above, one of these problems is illustrated. The parts marked in red indicate that the statement is fake news, while the parts in blue indicate that it is not the case. In the skating example, the explanation method of why the algorithm labelled them this way is inconsistent as it assigns opposite weights to the same words. For example “skating” is highlighted in both red and blue in the same text.

In the second example, it can be observed that many filler words are highlighted, such as “and”, “it”, “is”, “then”, “on”.

When the research team used this method to explain a simpler model, many conjunctions and prepositions were selected as well, but because the simpler model was explainable, they could observe and determine that those words were not as important to the algorithmic predictions. This means that the SHAP method does not explain the models well enough.

In future research, they plan to further analyze why these problems occur and how they can be avoided.