Identifying Emotions in Code Mixed Hindi-English Tweets

Sanket Sonu, Rejwanul Haque, Mohammed Hasanuzzaman, Paul Stynes, Pramod Pathak

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Emotion detection (ED) in tweets is a text classification problem that is of interest to Natural Language Processing (NLP) researchers. Code-mixing (CM) is a process of mixing linguistic units such as words of two different languages. The CM languages are characteristically different from the languages whose linguistic units are used for mixing. Whilst NLP has been shown to be successful for low-resource languages, it becomes challenging to perform NLP tasks on CM languages. As for ED, it has been rarely investigated on CM languages such as Hindi-English due to the lack of training data that is required for today's data-driven classification algorithms. This research proposes a gold standard dataset for detecting emotions in CM Hindi-English tweets. This paper also presents our results about the investigation of the usefulness of our gold-standard dataset while testing a number of state-of-the-art classification algorithms. We found that the ED classifier built using SVM provided us the highest accuracy (75.17%) on the hold-out test set. This research would benefit the NLP community in detecting emotions from social media platforms in multilingual societies.

Original languageEnglish
Title of host publication6th Workshop on Indian Language Data
Subtitle of host publicationResources and Evaluation, WILDRE 2022 - held in conjunction with the International Conference on Language Resources and Evaluation, LREC 2022 - Proceedings
EditorsGirish Nath Jha, Sobha Lalitha Devi, Kalika Bali, Atul Kr. Ojha
PublisherEuropean Language Resources Association (ELRA)
Pages35-41
Number of pages7
ISBN (Electronic)9791095546870
Publication statusPublished - 2022
Externally publishedYes
Event6th Workshop on Indian Language Data: Resources and Evaluation, WILDRE 2022 - Marseille, France
Duration: 20 Jun 2022 → …

Publication series

Name6th Workshop on Indian Language Data: Resources and Evaluation, WILDRE 2022 - held in conjunction with the International Conference on Language Resources and Evaluation, LREC 2022 - Proceedings

Conference

Conference6th Workshop on Indian Language Data: Resources and Evaluation, WILDRE 2022
Country/TerritoryFrance
CityMarseille
Period20/06/22 → …

Keywords

  • BERT
  • Code-mixing
  • Emotion Detection

Fingerprint

Dive into the research topics of 'Identifying Emotions in Code Mixed Hindi-English Tweets'. Together they form a unique fingerprint.

Cite this