Offensive Language Recognition in Social Media

  • Elena Shushkevich
  • , John Cardiff
  • , Paolo Rosso
  • , Liliya Akhtyamova

Research output: Contribution to journalArticlepeer-review

Abstract

This article proposes an approach to solving the problem of multiclassification within the framework of aggressive language recognition in Twitter. At the stage of preprocessing external data is added to the existing dataset, which is based on information in the links in dataset. This made it possible to expand the training dataset and thereby to improve the quality of the classification. The model created is an ensemble of classical machine learning models included Logistic Regression, Support Vector Machines, Naive Bayes models and a combination of Logistic Regression and Naive Bayes. The obtained value of macro F1-score for one of the experiments achieved 0.61, which exceeds the state-of-art published value by 1 percentage point. This indicates the potential value of the proposed approach in the field of hate speech recognition in social media.

Original languageEnglish
Pages (from-to)523-532
Number of pages10
JournalComputacion y Sistemas
Volume24
Issue number2
DOIs
Publication statusPublished - 2020

Keywords

  • Ensemble of models
  • Hate speech
  • Logistic regression
  • Naive bayes
  • Support vector machine

Fingerprint

Dive into the research topics of 'Offensive Language Recognition in Social Media'. Together they form a unique fingerprint.

Cite this