Activist: A new framework for dataset labelling

Jack O'Neill, Sarah Jane Delany, Brian MacNamee

Research output: Contribution to journalConference articlepeer-review

Abstract

Acquiring labels for large datasets can be a costly and time-consuming process. This has motivated the development of the semi-supervised learning problem domain, which makes use of unlabelled data - in conjunction with a small amount of labelled data - to infer the correct labels of a partially labelled dataset. Active Learning is one of the most successful approaches to semi-supervised learning, and has been shown to reduce the cost and time taken to produce a fully labelled dataset. In this paper we present Activist; a free, online, state-of-theart platform which leverages active learning techniques to improve the efficiency of dataset labelling. Using a simulated crowd-sourced label gathering scenario on a number of datasets, we show that the Activist software can speed up, and ultimately reduce the cost of label acquisition.

Original languageEnglish
Pages (from-to)140-148
Number of pages9
JournalCEUR Workshop Proceedings
Volume1751
DOIs
Publication statusPublished - 2016
Event24th Irish Conference on Artificial Intelligence and Cognitive Science, AICS 2016 - Dublin, Ireland
Duration: 20 Sep 201621 Sep 2016

Keywords

  • labels
  • datasets
  • semi-supervised learning
  • unlabelled data
  • labelled data
  • Active Learning
  • Activist
  • crowd-sourced label gathering
  • label acquisition

Fingerprint

Dive into the research topics of 'Activist: A new framework for dataset labelling'. Together they form a unique fingerprint.

Cite this