A comparison of ensemble and case-base maintenance techniques for handling concept drift in spam filtering

Sarah Jane Delany, Pádraig Cunningham, Alexey Tsymbal

Research output: Contribution to conferencePaperpeer-review

Abstract

The problem of concept drift has recently received considerable attention in machine learning research. One important practical problem where concept drift needs to be addressed is spam filtering. The literature on concept drift shows that among the most promising approaches are ensembles and a variety of techniques for ensemble construction has been proposed. In this paper we compare the ensemble approach to an alternative lazy learning approach to concept drift whereby a single case-based classifier for spam filtering keeps itself up-to-date through a case-base maintenance protocol. The case-base maintenance approach offers a more straightforward strategy for handling concept drift than updating ensembles with new classifiers. We present an evaluation that shows that the case-base maintenance approach is at least as effective as a selection of ensemble techniques. The evaluation is complicated by the overriding importance of False Positives (FPs) in spam filtering. The ensemble approaches can have very good performance on FPs because it is possible to bias an ensemble more strongly away from FPs than it is to bias the single classifer. However this comes at considerable cost to the overall accuracy.

Original languageEnglish
Pages340-345
Number of pages6
Publication statusPublished - 2006
EventFLAIRS 2006 - 19th International Florida Artificial Intelligence Research Society Conference - Melbourne Beach, FL, United States
Duration: 11 May 200613 May 2006

Conference

ConferenceFLAIRS 2006 - 19th International Florida Artificial Intelligence Research Society Conference
Country/TerritoryUnited States
CityMelbourne Beach, FL
Period11/05/0613/05/06

Fingerprint

Dive into the research topics of 'A comparison of ensemble and case-base maintenance techniques for handling concept drift in spam filtering'. Together they form a unique fingerprint.

Cite this