Handling concept drift in a text data stream constrained by high labelling cost

Patrick Lindstrom, Sarah Jane Delany, Brian Mac Namee

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    45 Citations (Scopus)

    Abstract

    In many real-world classification problems the concept being modelled is not static but rather changes over time - a situation known as concept drift. Most techniques for handling concept drift rely on the true classifications of test instances being available shortly after classification so that classifiers can be retrained to handle the drift. However, in applications where labelling instances with their true class has a high cost this is not reasonable. In this paper we present an approach for keeping a classifier up-to-date in a concept drift domain which is constrained by a high cost of labelling. We use an active learning type approach to select those examples for labelling that are most useful in handling changes in concept. We show how this approach can adequately handle concept drift in a text filtering scenario requiring just 15% of the documents to be manually categorised and labelled.

    Original languageEnglish
    Title of host publicationProceedings of the 23rd International Florida Artificial Intelligence Research Society Conference, FLAIRS-23
    Pages32-37
    Number of pages6
    Publication statusPublished - 2010
    Event23rd International Florida Artificial Intelligence Research Society Conference, FLAIRS-23 - Daytona Beach, FL, United States
    Duration: 19 May 201021 May 2010

    Publication series

    NameProceedings of the 23rd International Florida Artificial Intelligence Research Society Conference, FLAIRS-23

    Conference

    Conference23rd International Florida Artificial Intelligence Research Society Conference, FLAIRS-23
    Country/TerritoryUnited States
    CityDaytona Beach, FL
    Period19/05/1021/05/10

    Fingerprint

    Dive into the research topics of 'Handling concept drift in a text data stream constrained by high labelling cost'. Together they form a unique fingerprint.

    Cite this