Clustering weblogs on the basis of a topic detection method

Fernando Perez-Tellez, David Pinto, John Cardiff, Paolo Rosso

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    Abstract

    In recent years we have seen a vast increase in the volume of information published on weblog sites and also the creation of new web technologies where people discuss actual events. The need for automatic tools to organize this massive amount of information is clear, but the particular characteristics of weblogs such as shortness and overlapping vocabulary make this task difficult. In this work, we present a novel methodology to cluster weblog posts according to the topics discussed therein. This methodology is based on a generative probabilistic model in conjunction with a Self-Term Expansion methodology. We present our results which demonstrate a considerable improvement over the baseline.

    Original languageEnglish
    Title of host publicationAdvances in Pattern Recognition - Second Mexican Conference on Pattern Recognition, MCPR 2010, Proceedings
    Pages342-351
    Number of pages10
    DOIs
    Publication statusPublished - 2010
    EventMexican Conference on Pattern Recognition 2010, MCPR 2010 - Puebla, Mexico
    Duration: 27 Sep 201029 Sep 2010

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume6256 LNCS
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    ConferenceMexican Conference on Pattern Recognition 2010, MCPR 2010
    Country/TerritoryMexico
    CityPuebla
    Period27/09/1029/09/10

    Keywords

    • Clustering
    • Topic Detection
    • Weblogs

    Fingerprint

    Dive into the research topics of 'Clustering weblogs on the basis of a topic detection method'. Together they form a unique fingerprint.

    Cite this