Abstract
Data generated from naturally occurring processes tends to be non-stationary. For example, seasonal and gradual changes in climate data and sudden changes in financial data. In machine learning the degradation in classifier performance due to such changes in the data is known as concept drift and there are many approaches to detecting and handling it. Most approaches to detecting concept drift, however, make the assumption that true classes for test examples will be available at no cost shortly after classification and base the detection of concept drift on measures relying on these labels. The high labelling cost in many domains provides a strong motivation to reduce the number of labelled instances required to detect and handle concept drift. Triggered detection approaches that do not require labelled instances to detect concept drift show great promise for achieving this. In this paper we present Confidence Distribution Batch Detection, an approach that provides a signal correlated to changes in concept without using labelled data. This signal combined with a trigger and a rebuild policy can maintain classifier accuracy which, in most cases, matches the accuracy achieved using classification error based detection techniques but using only a limited amount of labelled data.
Original language | English |
---|---|
Pages (from-to) | 13-25 |
Number of pages | 13 |
Journal | Evolving Systems |
Volume | 4 |
Issue number | 1 |
DOIs | |
Publication status | Published - Mar 2013 |
Keywords
- Classifier confidence
- Concept drift
- Explicit drift detection
- Labelling cost