PRICER: Leveraging Few-Shot Learning with Fine-Tuned Large Language Models for Unstructured Economic Data

Matt Murtagh, P. J. Wall, Declan O’Sullivan

Research output: Contribution to journalConference articlepeer-review

Abstract

Accurate collection of economic data is crucial for metrics like the Consumer Price Index (CPI), informing policies on inflation and living costs. Traditional manual data collection methods from retail sources are labor-intensive and fraught with issues of scalability, accuracy, and data diversity. Our study introduces an OWL RDFS-based framework aligned with COICOP, and a transformer model,’PRICER’, to automate the extraction and structuring of online retail data into RDF. By iteratively fine-tuning PRICER—first with a broad DBPedia and Wikipedia knowledge base, then with specific online retail data—we achieve significant efficiency and accuracy improvements in data collection. Notably, PRICER shows marked performance gains in precision and recall after task-specific conditioning, validating our approach for converting unstructured text to structured knowledge. This advancement facilitates streamlined economic data aggregation and highlights PRICER’s adaptability for broader standardised data processing applications. Future work will focus scaling the domain specific price dataset, refining the model’s conditioning and exploring potential for other forms of technical data.

Original languageEnglish
Pages (from-to)26-36
Number of pages11
JournalCEUR Workshop Proceedings
Volume3697
Publication statusPublished - 2024
Event2nd International Workshop on Semantic Technologies and Deep Learning Models for Scientific, Technical and Legal Data, SemTech4STLD 2024 - Hersonissos, Greece
Duration: 26 May 2024 → …

Keywords

  • Deep Learning
  • Economic Data
  • Knowledge Graphs
  • Large Language Models

Fingerprint

Dive into the research topics of 'PRICER: Leveraging Few-Shot Learning with Fine-Tuned Large Language Models for Unstructured Economic Data'. Together they form a unique fingerprint.

Cite this