TY - GEN
T1 - GPT Assisted Annotation of Rhetorical and Linguistic Features for Interpretable Propaganda Technique Detection in News Text
AU - Hamilton, Kyle
AU - Longo, Luca
AU - Božić, Bojan
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s).
PY - 2024/5/13
Y1 - 2024/5/13
N2 - While the use of machine learning for the detection of propaganda techniques in text has garnered considerable attention, most approaches focus on “black-box” solutions with opaque inner workings. Interpretable approaches provide a solution, however, they depend on careful feature engineering and costly expert annotated data. Additionally, language features specific to propagandistic text are generally the focus of rhetoricians or linguists, and there is no data set labeled with such features suitable for machine learning. This study codifies 22 rhetorical and linguistic features identified in literature related to the language of persuasion for the purpose of annotating an existing data set labeled with propaganda techniques. To help human experts annotate natural language sentences with these features, RhetAnn, a web application, was specifically designed to minimize an otherwise considerable mental effort. Finally, a small set of annotated data was used to fine-tune GPT-3.5, a generative large language model (LLM), to annotate the remaining data while optimizing for financial cost and classification accuracy. This study demonstrates how combining a small number of human annotated examples with GPT can be an effective strategy for scaling the annotation process at a fraction of the cost of traditional annotation relying solely on human experts. The results are on par with the best performing model at the time of writing, namely GPT-4, at 10x less the cost. Our contribution is a set of features, their properties, definitions, and examples in a machine-readable format, along with the code for RhetAnn and the GPT prompts and fine-tuning procedures for advancing state-of-the-art interpretable propaganda technique detection.
AB - While the use of machine learning for the detection of propaganda techniques in text has garnered considerable attention, most approaches focus on “black-box” solutions with opaque inner workings. Interpretable approaches provide a solution, however, they depend on careful feature engineering and costly expert annotated data. Additionally, language features specific to propagandistic text are generally the focus of rhetoricians or linguists, and there is no data set labeled with such features suitable for machine learning. This study codifies 22 rhetorical and linguistic features identified in literature related to the language of persuasion for the purpose of annotating an existing data set labeled with propaganda techniques. To help human experts annotate natural language sentences with these features, RhetAnn, a web application, was specifically designed to minimize an otherwise considerable mental effort. Finally, a small set of annotated data was used to fine-tune GPT-3.5, a generative large language model (LLM), to annotate the remaining data while optimizing for financial cost and classification accuracy. This study demonstrates how combining a small number of human annotated examples with GPT can be an effective strategy for scaling the annotation process at a fraction of the cost of traditional annotation relying solely on human experts. The results are on par with the best performing model at the time of writing, namely GPT-4, at 10x less the cost. Our contribution is a set of features, their properties, definitions, and examples in a machine-readable format, along with the code for RhetAnn and the GPT prompts and fine-tuning procedures for advancing state-of-the-art interpretable propaganda technique detection.
KW - Annotation
KW - Large Language Models
KW - Natural Language Processing
KW - Propaganda Technique Detection
KW - Rhetorical Devices
UR - https://www.scopus.com/pages/publications/85194502404
U2 - 10.1145/3589335.3651909
DO - 10.1145/3589335.3651909
M3 - Conference contribution
AN - SCOPUS:85194502404
T3 - WWW 2024 Companion - Companion Proceedings of the ACM Web Conference
SP - 1431
EP - 1440
BT - WWW 2024 Companion - Companion Proceedings of the ACM Web Conference
PB - Association for Computing Machinery (ACM)
T2 - 33rd ACM Web Conference, WWW 2024
Y2 - 13 May 2024 through 17 May 2024
ER -