Impact of Character n-grams Attention Scores for English and Russian News Articles Authorship Attribution

Liliya Makhmutova, Robert Ross, Giancarlo Salton

Research output: Contribution to conferencePosterpeer-review

Abstract

Language embeddings are often used as black-box word-level tools that provide powerful language analysis across many tasks, but yet for many tasks such as Authorship Attribution access to feature level information on character n-grams can provide insights to help with model refinement and development. In this paper we investigate and evaluate the importance of character n-grams within an embeddings context in authorship attribution through the use of attention scores. We perform this investigation both for English (Reuters-50-50) and Russian (Taiga) news authorship datasets. Our analysis show that character n-grams attention score is higher for n-grams that are considered to be important for authorship identification for humans. Beyond specific benefits in authorship attribution, this work provides insights into the importance of character n-grams as a unit within embeddings.

Original languageEnglish
Pages939-941
Number of pages3
DOIs
Publication statusPublished - 27 Mar 2023
Event38th Annual ACM Symposium on Applied Computing, SAC 2023 - Tallinn, Estonia
Duration: 27 Mar 202331 Mar 2023

Conference

Conference38th Annual ACM Symposium on Applied Computing, SAC 2023
Country/TerritoryEstonia
CityTallinn
Period27/03/2331/03/23

Keywords

  • attention score
  • authorship attribution task
  • character n-grams

Fingerprint

Dive into the research topics of 'Impact of Character n-grams Attention Scores for English and Russian News Articles Authorship Attribution'. Together they form a unique fingerprint.

Cite this