Hate speech classification for Sinhalese and Gujarati

Muhammad Deedahwar Mazhar Qureshi, Madhuri Sawant, M. Atif Qureshi, Wael Rashwan, Arjumand Younus, Simon Caton

Research output: Contribution to journalConference articlepeer-review

Abstract

We, representing Team”XAG-TUD,” participated in HASOC 2023, focusing on Task 1, which comprises subtasks 1A and 1B. Task 1A revolves around coarse-grained binary classification, specifically discriminating between content falling into the categories of HOF (Hateful or Offensive) and NOT for Sinhalese, a low-resource language. Similarly, Task 1B involves a similar classification for Gujarati, another low-resource language. In this paper, we provide detailed insights into our solutions for both sub-tasks within Task 1. Notably, our observations reveal that the LaBSE (Language-agnostic BERT Sentence Embedding) model consistently outperformed the XLM-R model for both sub-tasks, demonstrating its effectiveness in addressing hate speech classification challenges in these languages.

Original languageEnglish
Pages (from-to)501-515
Number of pages15
JournalCEUR Workshop Proceedings
Volume3681
Publication statusPublished - 2023
Event15th Forum for Information Retrieval Evaluation, FIRE 2023 - Goa, India
Duration: 15 Dec 202318 Dec 2023

Fingerprint

Dive into the research topics of 'Hate speech classification for Sinhalese and Gujarati'. Together they form a unique fingerprint.

Cite this