Short-text domain specific key terms/phrases extraction using an n-gram model with Wikipedia

M. Atif Qureshi, Colm O'Riordan, Gabriella Pasi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Finding domain specific key terms/phrases from a given set of documents is a challenging task. A domain may be defined as an area of interest over a collection of documents which may not be explicitly defined but implicitly observable in those documents. When considering a collection of documents related to academic research, examples of key terms/phrases may be Information Retrieval", "Marine Biology", etc. In this paper a technique for extracting important key terms/phrases in a considered topical domain is proposed using external evidence from the titles of Wikipedia articles and the Wikipedia category graph. We performed some experiments over the document collection of Web sites of different post-graduate schools. Our preliminary evaluations show promising results for the detection of domain specific key terms/phrases from the given set of domain focused Web pages.

Original languageEnglish
Title of host publicationCIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management
Pages2515-2518
Number of pages4
DOIs
Publication statusPublished - 2012
Externally publishedYes
Event21st ACM International Conference on Information and Knowledge Management, CIKM 2012 - Maui, HI, United States
Duration: 29 Oct 20122 Nov 2012

Publication series

NameACM International Conference Proceeding Series

Conference

Conference21st ACM International Conference on Information and Knowledge Management, CIKM 2012
Country/TerritoryUnited States
CityMaui, HI
Period29/10/122/11/12

Keywords

  • community detection
  • n-gram model
  • open-domain knowledge
  • wikipedia

Fingerprint

Dive into the research topics of 'Short-text domain specific key terms/phrases extraction using an n-gram model with Wikipedia'. Together they form a unique fingerprint.

Cite this