Identifying Fake News in Brazilian Portuguese

Marcelo Fischer, Rejwanul Haque, Paul Stynes, Pramod Pathak

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Spread of fake news and disinformation may have many profound consequences, e.g. social conflicts, distrust in media, political instability. Fake news identification is an well-established area of natural language processing (NLP). Given its recent success on English, fake news identification is currently being used as a tool by a variety of agencies including corporate companies and big media houses. However, fake news identification still possesses a challenge for languages other than English and low-resource languages. The bidirectional encoders using masked language models, e.g. bidirectional encoder representations from Transformers (BERT), multilingual BERT (mBERT), produce state-of-the-art results in numerous natural language processing (NLP) tasks. This transfer learning strategy is very effective when labeled data is not abundantly available especially in low-resource scenarios. This paper investigates the application of BERT for fake news identification in Brazilian Portuguese. In addition to BERT, we also tested a number of widely-used machine learning (ML) algorithms, methods and strategies for this task. We found that fake news identification models built using advanced ML algorithms including BERT performed excellently in this task, and interestingly, BERT is found to be the best-performing model which produces a F1_score of 98.4 on the hold-out test set.

Original languageEnglish
Title of host publicationNatural Language Processing and Information Systems - 27th International Conference on Applications of Natural Language to Information Systems, NLDB 2022, Proceedings
EditorsPaolo Rosso, Valerio Basile, Raquel Martínez, Elisabeth Métais, Farid Meziane
PublisherSpringer Science and Business Media Deutschland GmbH
Pages111-118
Number of pages8
ISBN (Print)9783031084720
DOIs
Publication statusPublished - 2022
Externally publishedYes
Event27th International Conference on Applications of Natural Language to Information Systems, NLDB 2022 - Valencia, Spain
Duration: 15 Jun 202217 Jun 2022

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13286 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference27th International Conference on Applications of Natural Language to Information Systems, NLDB 2022
Country/TerritorySpain
CityValencia
Period15/06/2217/06/22

Keywords

  • Deep learning
  • Fact checking
  • Fake news identification

Fingerprint

Dive into the research topics of 'Identifying Fake News in Brazilian Portuguese'. Together they form a unique fingerprint.

Cite this