Abstract
Recent increases in the use and availability of short messages have created opportunities to harvest vast amounts of information through machine-based classification. However, traditional classification methods have failed to yield accuracies comparable to classification accuracies on longer texts. Several approaches have previously been employed to extend traditional methods to overcome this problem, including the enhancement of the original texts through the construction of associations with external data supplementation sources. Existing literature does not precisely describe the impact of text length on classification performance. This work quantitatively examines the changes in accuracy of a small selection of classifiers using a variety of enhancement methods, as text length progressively decreases. Findings, based on ANOVA testing at a 95% confidence interval, suggest that the performance of classifiers using simple enhancements decreases with decreasing text length, but that the use of more sophisticated enhancements risks over-supplementation of the text and consequent concept drift and classification performance decrease as text length increases.
Original language | English |
---|---|
DOIs | |
Publication status | Published - 2017 |
Event | 25th Irish Conference on Artificial Intelligence and Cognitive Science (AICS 2017) - Dublin, Ireland Duration: 7 Dec 2017 → 8 Dec 2017 |
Conference
Conference | 25th Irish Conference on Artificial Intelligence and Cognitive Science (AICS 2017) |
---|---|
Country/Territory | Ireland |
City | Dublin |
Period | 7/12/17 → 8/12/17 |
Keywords
- short messages
- machine-based classification
- text length
- classification performance
- enhancement methods
- ANOVA testing
- concept drift