Extended list of stop words: Does it work for keyphrase extraction from short texts?

Svetlana Popova, Gabriella Skitalinskaya

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Citations (Scopus)

Abstract

In this paper we study the problem of key phrase extraction from short texts written in Russian. As texts we consider messages posted on Internet car forums related to the purchase or repair of cars. The main assumption made is: the construction of lists of stop words for key phrase extraction can be effective if performed on the basis of a small, expert-marked collection. The results show that even a small number of texts marked by an expert can be enough to build an extended list of stop words. Extracted stop words allow to improve the quality of the key phrase extraction algorithm. Prior, we used a similar approach for key phrase extraction from scientific abstracts in the English language. In this paper we work with Russian texts. The obtained results show that the proposed approach works not only for texts that are appropriate in terms of structure and literacy, such as abstracts, but also for short texts, such as forum messages, in which many words may be misspelled and the text itself is poorly structured. Moreover, the results show that proposed approach works well not only with English texts, but also with texts in the Russian language.

Original languageEnglish
Title of host publicationProceedings of the 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies, CSIT 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages401-404
Number of pages4
ISBN (Electronic)9781538616383
DOIs
Publication statusPublished - 6 Nov 2017
Externally publishedYes
Event12th International Scientific and Technical Conference on Computer Sciences and Information Technologies, CSIT 2017 - Lviv, Ukraine
Duration: 5 Sep 20178 Sep 2017

Publication series

NameProceedings of the 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies, CSIT 2017
Volume1

Conference

Conference12th International Scientific and Technical Conference on Computer Sciences and Information Technologies, CSIT 2017
Country/TerritoryUkraine
CityLviv
Period5/09/178/09/17

Keywords

  • information retrieval
  • keyphrase extraction
  • short texts
  • stop words

Fingerprint

Dive into the research topics of 'Extended list of stop words: Does it work for keyphrase extraction from short texts?'. Together they form a unique fingerprint.

Cite this