Improving the clustering of blogosphere with a self-term enriching technique

Fernando Perez-Tellez, David Pinto, John Cardiff, Paolo Rosso

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    Abstract

    The analysis of blogs is emerging as an exciting new area in the text processing field which attempts to harness and exploit the vast quantity of information being published by individuals. However, their particular characteristics (shortness, vocabulary size and nature, etc.) make it difficult to achieve good results using automated clustering techniques. Moreover, the fact that many blogs may be considered to be narrow domain means that exploiting external linguistic resources can have limited value. In this paper, we present a methodology to improve the performance of clustering techniques on blogs, which does not rely on external resources. Our results show that this technique can produce significant improvements in the quality of clusters produced.

    Original languageEnglish
    Title of host publicationText, Speech and Dialogue - 12th International Conference, TSD 2009, Proceedings
    Pages40-47
    Number of pages8
    DOIs
    Publication statusPublished - 2009
    Event12th International Conference on Text, Speech and Dialogue, TSD 2009 - Pilsen, Czech Republic
    Duration: 13 Sep 200917 Sep 2009

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume5729 LNAI
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference12th International Conference on Text, Speech and Dialogue, TSD 2009
    Country/TerritoryCzech Republic
    CityPilsen
    Period13/09/0917/09/09

    Fingerprint

    Dive into the research topics of 'Improving the clustering of blogosphere with a self-term enriching technique'. Together they form a unique fingerprint.

    Cite this