TY - GEN
T1 - Identifying Fake News in Brazilian Portuguese
AU - Fischer, Marcelo
AU - Haque, Rejwanul
AU - Stynes, Paul
AU - Pathak, Pramod
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - Spread of fake news and disinformation may have many profound consequences, e.g. social conflicts, distrust in media, political instability. Fake news identification is an well-established area of natural language processing (NLP). Given its recent success on English, fake news identification is currently being used as a tool by a variety of agencies including corporate companies and big media houses. However, fake news identification still possesses a challenge for languages other than English and low-resource languages. The bidirectional encoders using masked language models, e.g. bidirectional encoder representations from Transformers (BERT), multilingual BERT (mBERT), produce state-of-the-art results in numerous natural language processing (NLP) tasks. This transfer learning strategy is very effective when labeled data is not abundantly available especially in low-resource scenarios. This paper investigates the application of BERT for fake news identification in Brazilian Portuguese. In addition to BERT, we also tested a number of widely-used machine learning (ML) algorithms, methods and strategies for this task. We found that fake news identification models built using advanced ML algorithms including BERT performed excellently in this task, and interestingly, BERT is found to be the best-performing model which produces a F1_score of 98.4 on the hold-out test set.
AB - Spread of fake news and disinformation may have many profound consequences, e.g. social conflicts, distrust in media, political instability. Fake news identification is an well-established area of natural language processing (NLP). Given its recent success on English, fake news identification is currently being used as a tool by a variety of agencies including corporate companies and big media houses. However, fake news identification still possesses a challenge for languages other than English and low-resource languages. The bidirectional encoders using masked language models, e.g. bidirectional encoder representations from Transformers (BERT), multilingual BERT (mBERT), produce state-of-the-art results in numerous natural language processing (NLP) tasks. This transfer learning strategy is very effective when labeled data is not abundantly available especially in low-resource scenarios. This paper investigates the application of BERT for fake news identification in Brazilian Portuguese. In addition to BERT, we also tested a number of widely-used machine learning (ML) algorithms, methods and strategies for this task. We found that fake news identification models built using advanced ML algorithms including BERT performed excellently in this task, and interestingly, BERT is found to be the best-performing model which produces a F1_score of 98.4 on the hold-out test set.
KW - Deep learning
KW - Fact checking
KW - Fake news identification
UR - http://www.scopus.com/inward/record.url?scp=85132989062&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-08473-7_10
DO - 10.1007/978-3-031-08473-7_10
M3 - Conference contribution
AN - SCOPUS:85132989062
SN - 9783031084720
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 111
EP - 118
BT - Natural Language Processing and Information Systems - 27th International Conference on Applications of Natural Language to Information Systems, NLDB 2022, Proceedings
A2 - Rosso, Paolo
A2 - Basile, Valerio
A2 - Martínez, Raquel
A2 - Métais, Elisabeth
A2 - Meziane, Farid
PB - Springer Science and Business Media Deutschland GmbH
T2 - 27th International Conference on Applications of Natural Language to Information Systems, NLDB 2022
Y2 - 15 June 2022 through 17 June 2022
ER -