Exploring Online Novelty Detection Using First Story Detection Models

Fei Wang, Robert J. Ross, John D. Kelleher

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Online novelty detection is an important technology in understanding and exploiting streaming data. One application of online novelty detection is First Story Detection (FSD) which attempts to find the very first story about a new topic, e.g. the first news report discussing the “Beast from the East” hitting Ireland. Although hundreds of FSD models have been developed, the vast majority of these only aim at improving the performance of the detection for some specific dataset, and very few focus on the insight of novelty itself. We believe that online novelty detection, framed as an unsupervised learning problem, always requires a clear definition of novelty. Indeed, we argue the definition of novelty is the key issue in designing a good detection model. Within the context of FSD, we first categorise online novelty detection models into three main categories, based on different definitions of novelty scores, and then compare the performances of these model categories in different features spaces. Our experimental results show that the challenge of FSD varies across novelty scores (and corresponding model categories); and, furthermore, that the detection of novelty in the very popular Word2Vec feature space is more difficult than in a normal frequency-based feature space because of a loss of word specificity.

Original languageEnglish
Title of host publicationIntelligent Data Engineering and Automated Learning – IDEAL 2018 - 19th International Conference, Proceedings
EditorsHujun Yin, Paulo Novais, David Camacho, Antonio J. Tallón-Ballesteros
PublisherSpringer Verlag
Pages107-116
Number of pages10
ISBN (Print)9783030034924
DOIs
Publication statusPublished - 2018
Event19th International Conference on Intelligent Data Engineering and Automated Learning, IDEAL 2018 - Madrid, Spain
Duration: 21 Nov 201823 Nov 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11314 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference19th International Conference on Intelligent Data Engineering and Automated Learning, IDEAL 2018
Country/TerritorySpain
CityMadrid
Period21/11/1823/11/18

Keywords

  • Feature space
  • First Story Detection (FSD)
  • Novelty score
  • Online novelty detection
  • Unsupervised learning
  • Word2Vec

Fingerprint

Dive into the research topics of 'Exploring Online Novelty Detection Using First Story Detection Models'. Together they form a unique fingerprint.

Cite this