An Analysis of Case-Base Editing in a Spam Filtering System

Sarah Jane Delany, Pádraig Cunningham

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

Because of the volume of spam email and its evolving nature, any deployed Machine Learning-based spam filtering system will need to have procedures for case-base maintenance. Key to this will be procedures to edit the case-base to remove noise and eliminate redundancy. In this paper we present a two stage process to do this. We present a new noise reduction algorithm called Blame-Based Noise Reduction that removes cases that are observed to cause misclassification. We also present an algorithm called Conservative Redundancy Reduction that is much less aggressive than the state-of-the-art alternatives and has significantly better generalisation performance in this domain. These new techniques are evaluated against the alternatives in the literature on four datasets of 1000 emails each (50% spam and 50% non spam).

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
EditorsPeter Funk, Pedro A. Gonzalez-Calero
PublisherSpringer Verlag
Pages128-141
Number of pages14
ISBN (Print)3540228829, 9783540228820
DOIs
Publication statusPublished - 2004

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3155
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Fingerprint

Dive into the research topics of 'An Analysis of Case-Base Editing in a Spam Filtering System'. Together they form a unique fingerprint.

Cite this