Toward Inclusive Online Environments: Counterfactual-Inspired XAI for Detecting and Interpreting Hateful and Offensive Tweets

Wael Rashwan, Muhammad Atif Qureshi, Muhammad Deedahwar Mazhar Qureshi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The prevalence of hate speech and offensive language on social media platforms such as Twitter has significant consequences, ranging from psychological harm to the polarization of societies. Consequently, social media companies have implemented content moderation measures to curb harmful or discriminatory language. However, a lack of consistency and transparency hinders their ability to achieve desired outcomes. This article evaluates various ML models, including an ensemble, Explainable Boosting Machine (EBM), and Linear Support Vector Classifier (SVC), on a public dataset of 24,792 tweets by T. Davidson, categorizing tweets into three classes: hate, offensive, and neither. The top-performing model achieves a weighted F1-Score of 0.90. Furthermore, this article interprets the output of the best-performing model using LIME and SHAP, elucidating how specific words and phrases within a tweet contextually impact its classification. This analysis helps to shed light on the linguistic aspects of hate and offense. Additionally, we employ LIME to present a suggestive counterfactual approach, proposing no-hate alternatives for a tweet to further explain the influence of word choices in context. Limitations of the study include the potential for biased results due to dataset imbalance, which future research may address by exploring more balanced datasets or leveraging additional features. Ultimately, through these explanations, this work aims to promote digital literacy and foster an inclusive online environment that encourages informed and responsible use of digital technologies (A GitHub repository containing code, data, and pre-trained models is available at: https://github.com/DeedahwarMazhar/XAI-Counterfactual-Hate-Speech ).

Original languageEnglish (Ireland)
Title of host publicationWorld Conference on Explainable Artificial Intelligence xAI 2023: Explainable Artificial Intelligence
EditorsLuca Longo
PublisherSpringer Science and Business Media Deutschland GmbH
Pages97-119
Number of pages23
ISBN (Print)9783031440694
DOIs
Publication statusPublished - Jul 2023
Event1st World Conference on eXplainable Artificial Intelligence, xAI 2023 - Lisbon, Portugal
Duration: 26 Jul 202328 Jul 2023

Publication series

NameCommunications in Computer and Information Science
Volume1903 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference1st World Conference on eXplainable Artificial Intelligence, xAI 2023
Country/TerritoryPortugal
CityLisbon
Period26/07/2328/07/23

Keywords

  • Counterfactual
  • Digital Literacy
  • LIME
  • Machine Learning
  • SHAP
  • XAI

Fingerprint

Dive into the research topics of 'Toward Inclusive Online Environments: Counterfactual-Inspired XAI for Detecting and Interpreting Hateful and Offensive Tweets'. Together they form a unique fingerprint.

Cite this