Abstract
Extracting from a given document collection what we call "domain-specific" key terms/phrases is a challenging task. By "domain-specific" key terms/phrases we mean words/expressions representative of the topical areas specific to the focus of a document collection. For example, when a collection is related to academic research (i.e., its focus is related to topics dealing with academic research), the domain-specific key terms/phrases could be 'Information Retrieval', 'Marine Biology', 'Science', etc. In this contribution a technique for identifying domain-specific key terms/phrases from a collection of documents is proposed. The proposed technique works on short textual descriptions, and it makes use of the titles of Wikipedia articles and of the Wikipedia category graph. We performed some experiments over the document collection (html title text only) of eight post-graduate school Web sites of five different countries. The evaluations show promising results for the identification of domain-specific key terms/phrases.
Original language | English |
---|---|
Pages (from-to) | 63-74 |
Number of pages | 12 |
Journal | CEUR Workshop Proceedings |
Volume | 1127 |
Publication status | Published - 2014 |
Externally published | Yes |
Event | 5th Italian Information Retrieval Workshop, IIR 2014 - Roma, Italy Duration: 20 Jan 2014 → 21 Jan 2014 |