How Short is a Piece of String?: the Impact of Text Length and Text Augmentation on Short-text Classification Accuracy

Austin McCartney, Svetlana Hensman, Luca Longo

Research output: Contribution to conferencePaperpeer-review

Abstract

Recent increases in the use and availability of short messages have created opportunities to harvest vast amounts of information through machine-based classification. However, traditional classification methods have failed to yield accuracies comparable to classification accuracies on longer texts. Several approaches have previously been employed to extend traditional methods to overcome this problem, including the enhancement of the original texts through the construction of associations with external data supplementation sources. Existing literature does not precisely describe the impact of text length on classification performance. This work quantitatively examines the changes in accuracy of a small selection of classifiers using a variety of enhancement methods, as text length progressively decreases. Findings, based on ANOVA testing at a 95% confidence interval, suggest that the performance of classifiers using simple enhancements decreases with decreasing text length, but that the use of more sophisticated enhancements risks over-supplementation of the text and consequent concept drift and classification performance decrease as text length increases.
Original languageEnglish
DOIs
Publication statusPublished - 2017
Event25th Irish Conference on Artificial Intelligence and Cognitive Science (AICS 2017) - Dublin, Ireland
Duration: 7 Dec 20178 Dec 2017

Conference

Conference25th Irish Conference on Artificial Intelligence and Cognitive Science (AICS 2017)
Country/TerritoryIreland
CityDublin
Period7/12/178/12/17

Keywords

  • short messages
  • machine-based classification
  • text length
  • classification performance
  • enhancement methods
  • ANOVA testing
  • concept drift

Fingerprint

Dive into the research topics of 'How Short is a Piece of String?: the Impact of Text Length and Text Augmentation on Short-text Classification Accuracy'. Together they form a unique fingerprint.

Cite this