Exploiting wikipedia to identify domain-specific key terms/phrases from a short-text collection

M. Atif Qureshi, Colm O'Riordan, Gabriella Pasi

Research output: Contribution to journalConference articlepeer-review

Abstract

Extracting from a given document collection what we call "domain-specific" key terms/phrases is a challenging task. By "domain-specific" key terms/phrases we mean words/expressions representative of the topical areas specific to the focus of a document collection. For example, when a collection is related to academic research (i.e., its focus is related to topics dealing with academic research), the domain-specific key terms/phrases could be 'Information Retrieval', 'Marine Biology', 'Science', etc. In this contribution a technique for identifying domain-specific key terms/phrases from a collection of documents is proposed. The proposed technique works on short textual descriptions, and it makes use of the titles of Wikipedia articles and of the Wikipedia category graph. We performed some experiments over the document collection (html title text only) of eight post-graduate school Web sites of five different countries. The evaluations show promising results for the identification of domain-specific key terms/phrases.

Original languageEnglish
Pages (from-to)63-74
Number of pages12
JournalCEUR Workshop Proceedings
Volume1127
Publication statusPublished - 2014
Externally publishedYes
Event5th Italian Information Retrieval Workshop, IIR 2014 - Roma, Italy
Duration: 20 Jan 201421 Jan 2014

Fingerprint

Dive into the research topics of 'Exploiting wikipedia to identify domain-specific key terms/phrases from a short-text collection'. Together they form a unique fingerprint.

Cite this