A comparison of classical versus deep learning techniques for abusive content detection on social media sites

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The automated detection of abusive content on social media websites faces a variety of challenges including imbalanced training sets, the identification of an appropriate feature representation and the selection of optimal classifiers. Classifiers such as support vector machines (SVM), combined with bag of words or ngram feature representation, have traditionally dominated in text classification for decades. With the recent emergence of deep learning and word embeddings, an increasing number of researchers have started to focus on deep neural networks. In this paper, our aim is to explore cutting-edge techniques in automated abusive content detection. We use two deep learning approaches: Convolutional neural networks (CNNs) and recurrent neural networks (RNNs). We apply these to 9 public datasets derived from various social media websites. Firstly, we show that word embeddings pre-trained on the same data source as the subsequent classification task improves the prediction accuracy of deep learning models. Secondly, we investigate the impact of different levels of training set imbalances on classifier types. In comparison to the traditional SVM classifier, we identify that although deep learning models can outperform the classification results of the traditional SVM classifier when the associated training dataset is seriously imbalanced, the performance of the SVM classifier can be dramatically improved through the use of oversampling, surpassing the deep learning models. Our work can inform researchers in selecting appropriate text classification strategies in the detection of abusive content, including scenarios where the training datasets suffer from class imbalance.

Original languageEnglish
Title of host publicationSocial Informatics - 10th International Conference, SocInfo 2018, Proceedings
EditorsOlessia Koltsova, Dmitry I. Ignatov, Steffen Staab
PublisherSpringer Verlag
Pages117-133
Number of pages17
ISBN (Print)9783030011284
DOIs
Publication statusPublished - 2018
Event10th Conference on Social Informatics, SocInfo 2018 - Saint-Petersburg, Russian Federation
Duration: 25 Sep 201828 Sep 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11185 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference10th Conference on Social Informatics, SocInfo 2018
Country/TerritoryRussian Federation
CitySaint-Petersburg
Period25/09/1828/09/18

Keywords

  • Abuse detection
  • Deep learning
  • Text classification

Fingerprint

Dive into the research topics of 'A comparison of classical versus deep learning techniques for abusive content detection on social media sites'. Together they form a unique fingerprint.

Cite this