Stop-Word Lists in Keyphrase Extraction: Their Influence and Comparison

Research output: Contribution to journalArticlepeer-review

Abstract

Keyphrases provide a compact representation of a document‘s content and are useful in Web search systems, text data mining, and natural language processing applications. The keyphrase extraction domain has been developing for a long time, and achieving further improvements is becoming increasingly challenging. Algorithms compete for minimal gains, highlighting the significance of demonstrating ways to enhance the quality of both existing algorithms and those yet to be developed. This article aims to demonstrate and approve a simple way to enhance keyphrase extraction algorithms by using extended stop words. This enables the improvement of keyphrase extraction algorithms on average by 4% and more.Nevertheless, no studies have been conducted that compare different stop-word lists and their impact on the domain. Our goal is to overcome this gap. We compared the impact of both existing extended and standard stop-word lists on the performance of 10 unsupervised keyphrase extraction algorithms across 5 datasets (a total of 10 sub-datasets were used). We aimed to highlight that researching methods for constructing and using extended stop-word lists deserves attention and could become one of the sub-directions in the keyphrase extraction domain. Extended stop words, when a suitable list is selected, consistently enhance the performance of algorithms in a stable and statistically significant manner. Based on the obtained results, we can assume that knowing the type of text from which keyphrases need to be extracted allows us to select the most appropriate stop-word list.

Original languageEnglish
Pages (from-to)1449-1459
Number of pages11
JournalComputacion y Sistemas
Volume28
Issue number3
DOIs
Publication statusPublished - 2024

Keywords

  • Keyphrase extraction
  • NLP
  • stop words

Fingerprint

Dive into the research topics of 'Stop-Word Lists in Keyphrase Extraction: Their Influence and Comparison'. Together they form a unique fingerprint.

Cite this