Abstract
Keyphrases provide a compact representation of a document‘s content and can be efficiently used to enhance Web search results and improve natural language processing tasks. This paper extends the state-of-the-art in unsupervised keyphrase extraction from scientific abstracts. We aim to demonstrate the difference between two types of datasets used in the keyphrase extraction domain: datasets where keyphrases for each text are manually assigned by readers, and datasets where keyphrases are assigned by the authors themselves. We aim to highlight the problem of single-word phrases and illustrate the role of this problem for each dataset type. Additionally, we noticed that well-known algorithms in the domain can be divided into two groups. Algorithms in the first group minimize the number of single-word phrases in the set of the extracted keyphrases. In contrast, algorithms in the second group allow the extraction of a larger number of single-word keyphrases. This property of algorithms”to extract few or many single-word keyphrases” determines how they perform on each type of dataset. We explain the reasons for this.
| Original language | English |
|---|---|
| Pages (from-to) | 1377-1391 |
| Number of pages | 15 |
| Journal | Computacion y Sistemas |
| Volume | 28 |
| Issue number | 3 |
| DOIs | |
| Publication status | Published - 2024 |
Keywords
- keyphrase length
- single-word phrase problem
- Unsupervised keyphrase extraction