Abstract
We, representing Team”XAG-TUD,” participated in HASOC 2023, focusing on Task 1, which comprises subtasks 1A and 1B. Task 1A revolves around coarse-grained binary classification, specifically discriminating between content falling into the categories of HOF (Hateful or Offensive) and NOT for Sinhalese, a low-resource language. Similarly, Task 1B involves a similar classification for Gujarati, another low-resource language. In this paper, we provide detailed insights into our solutions for both sub-tasks within Task 1. Notably, our observations reveal that the LaBSE (Language-agnostic BERT Sentence Embedding) model consistently outperformed the XLM-R model for both sub-tasks, demonstrating its effectiveness in addressing hate speech classification challenges in these languages.
| Original language | English |
|---|---|
| Pages (from-to) | 501-515 |
| Number of pages | 15 |
| Journal | CEUR Workshop Proceedings |
| Volume | 3681 |
| Publication status | Published - 2023 |
| Event | 15th Forum for Information Retrieval Evaluation, FIRE 2023 - Goa, India Duration: 15 Dec 2023 → 18 Dec 2023 |
Fingerprint
Dive into the research topics of 'Hate speech classification for Sinhalese and Gujarati'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver