Efficiency of LLMs in Identifying Abusive Language Online: A Comparative Study of LSTM, BERT, and GPT

Zaur Gouliev, Rajesh R. Jaiswal

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

As social media continues to grow, the prevalence of abusive language on these platforms has emerged as a major safety concern, particularly for young people exposed to such harmful content, motivating our study. We aim to identify and classify instances of abusive language to create a more respectful and safer online environment. We utilise a range of models, including an LSTM-based architecture and LLMs such as BERT and GPT-3.5 to explore the efficacy of transfer learning in abusive language detection. Our methodology includes data preprocessing, model fine-tuning, and evaluation, with particular attention to addressing class imbalances in datasets through techniques such as SMOTE. We use the Davidson et al. dataset and the ConvAbuse dataset, well-known in the field of abusive language detection (ALD), alongside standard text preprocessing and hyperparameter tuning to optimise model performance. Results indicate that while all models exhibit proficiency in detecting abusive language, the GPT model achieves the highest accuracy, with 88% on the Davidson et al. dataset and 95% on the ConvAbuse dataset. Our findings highlight that transfer learning significantly enhances performance by leveraging the extensive language understanding of pre-trained models, improving detection accuracy with relatively minimal data and training time. This research demonstrates the potential of employing these technologies ethically and effectively to mitigate online abusive language on social media platforms.

Original languageEnglish
Title of host publicationHCAI-ep 2024 - Proceedings of the 2024 Conference on Human Centered Artificial Intelligence - Education and Practice
PublisherAssociation for Computing Machinery (ACM)
Pages1-7
Number of pages7
ISBN (Electronic)9798400711596
DOIs
Publication statusPublished - 2 Dec 2024
Event2nd Conference on Human Centered Artificial Intelligence - Education and Practice, HCAI-ep 2024 - Naples, Italy
Duration: 1 Dec 20242 Dec 2024

Publication series

NameACM International Conference Proceeding Series

Conference

Conference2nd Conference on Human Centered Artificial Intelligence - Education and Practice, HCAI-ep 2024
Country/TerritoryItaly
CityNaples
Period1/12/242/12/24

Keywords

  • Abusive Language Detection
  • Harmful Content
  • Hate Speech
  • Large Language Models
  • Transfer Learning

Fingerprint

Dive into the research topics of 'Efficiency of LLMs in Identifying Abusive Language Online: A Comparative Study of LSTM, BERT, and GPT'. Together they form a unique fingerprint.

Cite this