Abstract
The prevalence of hate speech and offensive language on social media platforms such as Twitter has significant consequences, ranging from psychological harm to the polarization of societies. Consequently, social media companies have implemented content moderation measures to curb harmful or discriminatory language. However, a lack of consistency and transparency hinders their ability to achieve desired outcomes. This article evaluates various ML models, including an ensemble, Explainable Boosting Machine (EBM), and Linear Support Vector Classifier (SVC), on a public dataset of 24,792 tweets by T. Davidson, categorizing tweets into three classes: hate, offensive, and neither. The top-performing model achieves a weighted F1-Score of 0.90. Furthermore, this article interprets the output of the best-performing model using LIME and SHAP, elucidating how specific words and phrases within a tweet contextually impact its classification. This analysis helps to shed light on the linguistic aspects of hate and offense. Additionally, we employ LIME to present a suggestive counterfactual approach, proposing no-hate alternatives for a tweet to further explain the influence of word choices in context. Limitations of the study include the potential for biased results due to dataset imbalance, which future research may address by exploring more balanced datasets or leveraging additional features. Ultimately, through these explanations, this work aims to promote digital literacy and foster an inclusive online environment that encourages informed and responsible use of digital technologies (A GitHub repository containing code, data, and pre-trained models is available at: https://github.com/DeedahwarMazhar/XAI-Counterfactual-Hate-Speech ).
| Original language | English (Ireland) |
|---|---|
| Title of host publication | World Conference on Explainable Artificial Intelligence xAI 2023: Explainable Artificial Intelligence |
| Editors | Luca Longo |
| Publisher | Springer Science and Business Media Deutschland GmbH |
| Pages | 97-119 |
| Number of pages | 23 |
| ISBN (Print) | 9783031440694 |
| DOIs | |
| Publication status | Published - Jul 2023 |
| Event | 1st World Conference on eXplainable Artificial Intelligence, xAI 2023 - Lisbon, Portugal Duration: 26 Jul 2023 → 28 Jul 2023 |
Publication series
| Name | Communications in Computer and Information Science |
|---|---|
| Volume | 1903 CCIS |
| ISSN (Print) | 1865-0929 |
| ISSN (Electronic) | 1865-0937 |
Conference
| Conference | 1st World Conference on eXplainable Artificial Intelligence, xAI 2023 |
|---|---|
| Country/Territory | Portugal |
| City | Lisbon |
| Period | 26/07/23 → 28/07/23 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Keywords
- Counterfactual
- Digital Literacy
- LIME
- Machine Learning
- SHAP
- XAI
Fingerprint
Dive into the research topics of 'Toward Inclusive Online Environments: Counterfactual-Inspired XAI for Detecting and Interpreting Hateful and Offensive Tweets'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver