TY - GEN
T1 - Improving the clustering of blogosphere with a self-term enriching technique
AU - Perez-Tellez, Fernando
AU - Pinto, David
AU - Cardiff, John
AU - Rosso, Paolo
PY - 2009
Y1 - 2009
N2 - The analysis of blogs is emerging as an exciting new area in the text processing field which attempts to harness and exploit the vast quantity of information being published by individuals. However, their particular characteristics (shortness, vocabulary size and nature, etc.) make it difficult to achieve good results using automated clustering techniques. Moreover, the fact that many blogs may be considered to be narrow domain means that exploiting external linguistic resources can have limited value. In this paper, we present a methodology to improve the performance of clustering techniques on blogs, which does not rely on external resources. Our results show that this technique can produce significant improvements in the quality of clusters produced.
AB - The analysis of blogs is emerging as an exciting new area in the text processing field which attempts to harness and exploit the vast quantity of information being published by individuals. However, their particular characteristics (shortness, vocabulary size and nature, etc.) make it difficult to achieve good results using automated clustering techniques. Moreover, the fact that many blogs may be considered to be narrow domain means that exploiting external linguistic resources can have limited value. In this paper, we present a methodology to improve the performance of clustering techniques on blogs, which does not rely on external resources. Our results show that this technique can produce significant improvements in the quality of clusters produced.
UR - http://www.scopus.com/inward/record.url?scp=70349852770&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-04208-9_9
DO - 10.1007/978-3-642-04208-9_9
M3 - Conference contribution
AN - SCOPUS:70349852770
SN - 3642042074
SN - 9783642042072
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 40
EP - 47
BT - Text, Speech and Dialogue - 12th International Conference, TSD 2009, Proceedings
T2 - 12th International Conference on Text, Speech and Dialogue, TSD 2009
Y2 - 13 September 2009 through 17 September 2009
ER -