TY - GEN
T1 - Using Explainable AI (XAI) for Identification of Subjectivity in Hate Speech Annotations for Low-Resource Languages
AU - Sawant, Madhuri
AU - Younus, Arjumand
AU - Caton, Simon
AU - Qureshi, Muhammad Atif
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s).
PY - 2024/9/10
Y1 - 2024/9/10
N2 - The proliferation of hate speech on digital platforms has become a significant issue, and automated content moderation systems built on machine learning are a proposed solution. However, they face challenges in multilingual and low-resource settings due to the need for extensive labelled data. This paper introduces an explainable AI framework designed to identify annotation discrepancies in low-resource languages, focusing on Hindi, the third most-spoken language worldwide, which lacks comprehensive research in hate speech detection. By examining the labelling quality of the Hate speech and Offensive Content Identification in English and Indo-Aryan Languages (HASOC) challenge, we use unsupervised learning methods to extract topical variations and annotation behavior and apply these features in an explainable AI-based classification model, TabNet. We release a relabelled Hindi hate speech benchmark dataset with label-flipping information and related metadata to facilitate research in this area. The source code has also been released for reproducibility purposes. Please be advised that this work contains examples of toxic content
AB - The proliferation of hate speech on digital platforms has become a significant issue, and automated content moderation systems built on machine learning are a proposed solution. However, they face challenges in multilingual and low-resource settings due to the need for extensive labelled data. This paper introduces an explainable AI framework designed to identify annotation discrepancies in low-resource languages, focusing on Hindi, the third most-spoken language worldwide, which lacks comprehensive research in hate speech detection. By examining the labelling quality of the Hate speech and Offensive Content Identification in English and Indo-Aryan Languages (HASOC) challenge, we use unsupervised learning methods to extract topical variations and annotation behavior and apply these features in an explainable AI-based classification model, TabNet. We release a relabelled Hindi hate speech benchmark dataset with label-flipping information and related metadata to facilitate research in this area. The source code has also been released for reproducibility purposes. Please be advised that this work contains examples of toxic content
KW - Hate Speech Classification
KW - Transformers
KW - Under-resourced languages
KW - XAI
KW - multilingual
UR - https://www.scopus.com/pages/publications/85207087570
U2 - 10.1145/3677117.3685006
DO - 10.1145/3677117.3685006
M3 - Conference contribution
T3 - OASIS 2024 - Proceedings of the 2024 Workshop on Open Challenges in Online Social Media, Held in conjunction with the 35th ACM Conference on Hypertext and Social Media, HT 2024
SP - 10
EP - 17
BT - OASIS 2024 - Proceedings of the 2024 Workshop on Open Challenges in Online Social Media, Held in conjunction with the 35th ACM Conference on Hypertext and Social Media, HT 2024
ER -