Unsupervised Keyphrase Extraction: Ranking Step and Single-Word Phrase Problem

  • Svetlana Popova
  • , Vera Danilova
  • , Mikhail Alexandrov
  • , John Cardiff

Research output: Contribution to journalArticlepeer-review

Abstract

Keyphrases provide a compact representation of a document‘s content and can be efficiently used to enhance Web search results and improve natural language processing tasks. This paper extends the state-of-the-art in unsupervised keyphrase extraction from scientific abstracts. We aim to demonstrate the difference between two types of datasets used in the keyphrase extraction domain: datasets where keyphrases for each text are manually assigned by readers, and datasets where keyphrases are assigned by the authors themselves. We aim to highlight the problem of single-word phrases and illustrate the role of this problem for each dataset type. Additionally, we noticed that well-known algorithms in the domain can be divided into two groups. Algorithms in the first group minimize the number of single-word phrases in the set of the extracted keyphrases. In contrast, algorithms in the second group allow the extraction of a larger number of single-word keyphrases. This property of algorithms”to extract few or many single-word keyphrases” determines how they perform on each type of dataset. We explain the reasons for this.

Original languageEnglish
Pages (from-to)1377-1391
Number of pages15
JournalComputacion y Sistemas
Volume28
Issue number3
DOIs
Publication statusPublished - 2024

Keywords

  • keyphrase length
  • single-word phrase problem
  • Unsupervised keyphrase extraction

Fingerprint

Dive into the research topics of 'Unsupervised Keyphrase Extraction: Ranking Step and Single-Word Phrase Problem'. Together they form a unique fingerprint.

Cite this