TY - GEN
T1 - Author Gender Identification Considering Gender Bias
AU - Jeyaraj, Manuela Nayantara
AU - Delany, Sarah Jane
N1 - Publisher Copyright:
© 2023, The Author(s).
PY - 2023
Y1 - 2023
N2 - Writing style and choice of words used in textual content can vary between men and women both in terms of who the text is talking about and who is writing the text. The focus of this paper is on author gender prediction, identifying the gender of who is writing the text. We compare closed and open vocabulary approaches on different types of textual content including more traditional writing styles such as in books, and more recent writing styles used in user generated content on digital platforms such as blogs and social media messaging. As supervised machine learning approaches can reflect human biases in the data they are trained on, we also consider the gender bias of the different approaches across the different types of dataset. We show that open vocabulary approaches perform better both in terms of prediction performance and with less gender bias.
AB - Writing style and choice of words used in textual content can vary between men and women both in terms of who the text is talking about and who is writing the text. The focus of this paper is on author gender prediction, identifying the gender of who is writing the text. We compare closed and open vocabulary approaches on different types of textual content including more traditional writing styles such as in books, and more recent writing styles used in user generated content on digital platforms such as blogs and social media messaging. As supervised machine learning approaches can reflect human biases in the data they are trained on, we also consider the gender bias of the different approaches across the different types of dataset. We show that open vocabulary approaches perform better both in terms of prediction performance and with less gender bias.
KW - Author gender identification
KW - Gender bias
KW - Open-vocabulary approach
UR - https://www.scopus.com/pages/publications/85149919023
U2 - 10.1007/978-3-031-26438-2_17
DO - 10.1007/978-3-031-26438-2_17
M3 - Conference contribution
AN - SCOPUS:85149919023
SN - 9783031264375
T3 - Communications in Computer and Information Science
SP - 214
EP - 225
BT - Artificial Intelligence and Cognitive Science - 30th Irish Conference, AICS 2022, Revised Selected Papers
A2 - Longo, Luca
A2 - O’Reilly, Ruairi
PB - Springer Science and Business Media Deutschland GmbH
T2 - 30th Irish Conference on Artificial Intelligence and Cognitive Science, AICS 2022
Y2 - 8 December 2022 through 9 December 2022
ER -