TY - GEN
T1 - Clustering weblogs on the basis of a topic detection method
AU - Perez-Tellez, Fernando
AU - Pinto, David
AU - Cardiff, John
AU - Rosso, Paolo
PY - 2010
Y1 - 2010
N2 - In recent years we have seen a vast increase in the volume of information published on weblog sites and also the creation of new web technologies where people discuss actual events. The need for automatic tools to organize this massive amount of information is clear, but the particular characteristics of weblogs such as shortness and overlapping vocabulary make this task difficult. In this work, we present a novel methodology to cluster weblog posts according to the topics discussed therein. This methodology is based on a generative probabilistic model in conjunction with a Self-Term Expansion methodology. We present our results which demonstrate a considerable improvement over the baseline.
AB - In recent years we have seen a vast increase in the volume of information published on weblog sites and also the creation of new web technologies where people discuss actual events. The need for automatic tools to organize this massive amount of information is clear, but the particular characteristics of weblogs such as shortness and overlapping vocabulary make this task difficult. In this work, we present a novel methodology to cluster weblog posts according to the topics discussed therein. This methodology is based on a generative probabilistic model in conjunction with a Self-Term Expansion methodology. We present our results which demonstrate a considerable improvement over the baseline.
KW - Clustering
KW - Topic Detection
KW - Weblogs
UR - http://www.scopus.com/inward/record.url?scp=78751555875&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-15992-3_36
DO - 10.1007/978-3-642-15992-3_36
M3 - Conference contribution
AN - SCOPUS:78751555875
SN - 3642159915
SN - 9783642159916
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 342
EP - 351
BT - Advances in Pattern Recognition - Second Mexican Conference on Pattern Recognition, MCPR 2010, Proceedings
T2 - Mexican Conference on Pattern Recognition 2010, MCPR 2010
Y2 - 27 September 2010 through 29 September 2010
ER -