Identity Term Sampling for Measuring Gender Bias in Training Data

Nasim Sobhani, Sarah Jane Delany

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Citations (Scopus)

Abstract

Predictions from machine learning models can reflect biases in the data on which they are trained. Gender bias has been identified in natural language processing systems such as those used for recruitment. The development of approaches to mitigate gender bias in training data typically need to be able to isolate the effect of gender on the output to see the impact of gender. While it is possible to isolate and identify gender for some types of training data, e.g. CVs in recruitment, for most textual corpora there is no obvious gender label. This paper proposes a general approach to measure bias in textual training data for NLP prediction systems by providing a gender label identified from the textual content of the training data. The approach is compared with the identity term template approach currently in use, also known as Gender Bias Evaluation Datasets (GBETs), which involves the design of synthetic test datasets which isolate gender and are used to probe for gender bias in a dataset. We show that our Identity Term Sampling (ITS) approach is capable of identifying gender bias at least as well as identity term templates and can be used on training data that has no obvious gender label.

Original languageEnglish
Title of host publicationArtificial Intelligence and Cognitive Science - 30th Irish Conference, AICS 2022, Revised Selected Papers
EditorsLuca Longo, Ruairi O’Reilly
PublisherSpringer Science and Business Media Deutschland GmbH
Pages226-238
Number of pages13
ISBN (Print)9783031264375
DOIs
Publication statusPublished - 2023
Event30th Irish Conference on Artificial Intelligence and Cognitive Science, AICS 2022 - Munster, Ireland
Duration: 8 Dec 20229 Dec 2022

Publication series

NameCommunications in Computer and Information Science
Volume1662 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference30th Irish Conference on Artificial Intelligence and Cognitive Science, AICS 2022
Country/TerritoryIreland
CityMunster
Period8/12/229/12/22

Keywords

  • Evaluation
  • Gender bias
  • Machine learning

Fingerprint

Dive into the research topics of 'Identity Term Sampling for Measuring Gender Bias in Training Data'. Together they form a unique fingerprint.

Cite this