Off to a good start: Using clustering to select the initial training set in active learning

Rong Hu, Brian Mac Namee, Sarah Jane Delany

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Active learning (AL) is used in textual classification to alleviate the cost of labelling documents for training. An important issue in AL is the selection of a representative sample of documents to label for the initial training set that seeds the process, and clustering techniques have been successfully used in this regard. However, the clustering techniques used are nondeterministic which causes inconsistent behaviour in the AL process. In this paper we first illustrate the problems associated with using non-deterministic clustering for initial training set selection in AL. We then examine the performance of three deterministic clustering techniques for this task and show that performance comparable to the non-deterministic approaches can be achieved without variations in behaviour.

Original languageEnglish
Title of host publicationProceedings of the 23rd International Florida Artificial Intelligence Research Society Conference, FLAIRS-23
Pages26-31
Number of pages6
DOIs
Publication statusPublished - 2010
Event23rd International Florida Artificial Intelligence Research Society Conference, FLAIRS-23 - Daytona Beach, FL, United States
Duration: 19 May 201021 May 2010

Publication series

NameProceedings of the 23rd International Florida Artificial Intelligence Research Society Conference, FLAIRS-23

Conference

Conference23rd International Florida Artificial Intelligence Research Society Conference, FLAIRS-23
Country/TerritoryUnited States
CityDaytona Beach, FL
Period19/05/1021/05/10

Keywords

  • Active learning
  • textual classification
  • labelling documents
  • initial training set
  • clustering techniques
  • non-deterministic clustering
  • deterministic clustering

Fingerprint

Dive into the research topics of 'Off to a good start: Using clustering to select the initial training set in active learning'. Together they form a unique fingerprint.

Cite this