TY - GEN
T1 - Toward Inclusive Online Environments
T2 - 1st World Conference on eXplainable Artificial Intelligence, xAI 2023
AU - Rashwan, Wael
AU - Qureshi, Muhammad Atif
AU - Qureshi, Muhammad Deedahwar Mazhar
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023/7
Y1 - 2023/7
N2 - The prevalence of hate speech and offensive language on social media platforms such as Twitter has significant consequences, ranging from psychological harm to the polarization of societies. Consequently, social media companies have implemented content moderation measures to curb harmful or discriminatory language. However, a lack of consistency and transparency hinders their ability to achieve desired outcomes. This article evaluates various ML models, including an ensemble, Explainable Boosting Machine (EBM), and Linear Support Vector Classifier (SVC), on a public dataset of 24,792 tweets by T. Davidson, categorizing tweets into three classes: hate, offensive, and neither. The top-performing model achieves a weighted F1-Score of 0.90. Furthermore, this article interprets the output of the best-performing model using LIME and SHAP, elucidating how specific words and phrases within a tweet contextually impact its classification. This analysis helps to shed light on the linguistic aspects of hate and offense. Additionally, we employ LIME to present a suggestive counterfactual approach, proposing no-hate alternatives for a tweet to further explain the influence of word choices in context. Limitations of the study include the potential for biased results due to dataset imbalance, which future research may address by exploring more balanced datasets or leveraging additional features. Ultimately, through these explanations, this work aims to promote digital literacy and foster an inclusive online environment that encourages informed and responsible use of digital technologies (A GitHub repository containing code, data, and pre-trained models is available at: https://github.com/DeedahwarMazhar/XAI-Counterfactual-Hate-Speech ).
AB - The prevalence of hate speech and offensive language on social media platforms such as Twitter has significant consequences, ranging from psychological harm to the polarization of societies. Consequently, social media companies have implemented content moderation measures to curb harmful or discriminatory language. However, a lack of consistency and transparency hinders their ability to achieve desired outcomes. This article evaluates various ML models, including an ensemble, Explainable Boosting Machine (EBM), and Linear Support Vector Classifier (SVC), on a public dataset of 24,792 tweets by T. Davidson, categorizing tweets into three classes: hate, offensive, and neither. The top-performing model achieves a weighted F1-Score of 0.90. Furthermore, this article interprets the output of the best-performing model using LIME and SHAP, elucidating how specific words and phrases within a tweet contextually impact its classification. This analysis helps to shed light on the linguistic aspects of hate and offense. Additionally, we employ LIME to present a suggestive counterfactual approach, proposing no-hate alternatives for a tweet to further explain the influence of word choices in context. Limitations of the study include the potential for biased results due to dataset imbalance, which future research may address by exploring more balanced datasets or leveraging additional features. Ultimately, through these explanations, this work aims to promote digital literacy and foster an inclusive online environment that encourages informed and responsible use of digital technologies (A GitHub repository containing code, data, and pre-trained models is available at: https://github.com/DeedahwarMazhar/XAI-Counterfactual-Hate-Speech ).
KW - Counterfactual
KW - Digital Literacy
KW - LIME
KW - Machine Learning
KW - SHAP
KW - XAI
UR - http://www.scopus.com/inward/record.url?scp=85176011419&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-44070-0_5
DO - 10.1007/978-3-031-44070-0_5
M3 - Conference contribution
AN - SCOPUS:85176011419
SN - 9783031440694
T3 - Communications in Computer and Information Science
SP - 97
EP - 119
BT - World Conference on Explainable Artificial Intelligence xAI 2023: Explainable Artificial Intelligence
A2 - Longo, Luca
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 26 July 2023 through 28 July 2023
ER -