TY - GEN
T1 - Exploring Online Novelty Detection Using First Story Detection Models
AU - Wang, Fei
AU - Ross, Robert J.
AU - Kelleher, John D.
N1 - Publisher Copyright:
© 2018, Springer Nature Switzerland AG.
PY - 2018
Y1 - 2018
N2 - Online novelty detection is an important technology in understanding and exploiting streaming data. One application of online novelty detection is First Story Detection (FSD) which attempts to find the very first story about a new topic, e.g. the first news report discussing the “Beast from the East” hitting Ireland. Although hundreds of FSD models have been developed, the vast majority of these only aim at improving the performance of the detection for some specific dataset, and very few focus on the insight of novelty itself. We believe that online novelty detection, framed as an unsupervised learning problem, always requires a clear definition of novelty. Indeed, we argue the definition of novelty is the key issue in designing a good detection model. Within the context of FSD, we first categorise online novelty detection models into three main categories, based on different definitions of novelty scores, and then compare the performances of these model categories in different features spaces. Our experimental results show that the challenge of FSD varies across novelty scores (and corresponding model categories); and, furthermore, that the detection of novelty in the very popular Word2Vec feature space is more difficult than in a normal frequency-based feature space because of a loss of word specificity.
AB - Online novelty detection is an important technology in understanding and exploiting streaming data. One application of online novelty detection is First Story Detection (FSD) which attempts to find the very first story about a new topic, e.g. the first news report discussing the “Beast from the East” hitting Ireland. Although hundreds of FSD models have been developed, the vast majority of these only aim at improving the performance of the detection for some specific dataset, and very few focus on the insight of novelty itself. We believe that online novelty detection, framed as an unsupervised learning problem, always requires a clear definition of novelty. Indeed, we argue the definition of novelty is the key issue in designing a good detection model. Within the context of FSD, we first categorise online novelty detection models into three main categories, based on different definitions of novelty scores, and then compare the performances of these model categories in different features spaces. Our experimental results show that the challenge of FSD varies across novelty scores (and corresponding model categories); and, furthermore, that the detection of novelty in the very popular Word2Vec feature space is more difficult than in a normal frequency-based feature space because of a loss of word specificity.
KW - Feature space
KW - First Story Detection (FSD)
KW - Novelty score
KW - Online novelty detection
KW - Unsupervised learning
KW - Word2Vec
UR - https://www.scopus.com/pages/publications/85057131406
U2 - 10.1007/978-3-030-03493-1_12
DO - 10.1007/978-3-030-03493-1_12
M3 - Conference contribution
AN - SCOPUS:85057131406
SN - 9783030034924
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 107
EP - 116
BT - Intelligent Data Engineering and Automated Learning – IDEAL 2018 - 19th International Conference, Proceedings
A2 - Yin, Hujun
A2 - Novais, Paulo
A2 - Camacho, David
A2 - Tallón-Ballesteros, Antonio J.
PB - Springer Verlag
T2 - 19th International Conference on Intelligent Data Engineering and Automated Learning, IDEAL 2018
Y2 - 21 November 2018 through 23 November 2018
ER -