On the difficulty of clustering company tweets

Fernando Perez-Tellez, David Pinto, John Cardiff, Paolo Rosso

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    Abstract

    Twitter is a new successful technology of the Web 2.0 genre which is used by millions of people and companies to publish brief messages ("tweets") with the purpose of sharing experiences and/or opinions about a product or service. Due to the huge amount of information available in this type of technology, there is a clear need for new systems that can mine these messages in order to derive information about the collective thinking of twitterers (e.g. for opinion or sentiment analysis). Tweet analysis is a very important task because comments, opinions, suggestions, complaints can be used as marketing strategies or for determining information on a company's reputation. For this purpose, it is necessary to establish whether a tweet refers to a company or not, which is not a straightforward keyword search process as there may be multiple contexts in which a name can be used. The aim of this work is to present and compare a number of different approaches based on clustering that determine whether a given tweet refers to a particular company or not. For this purpose, we have used an enriching methodology in order to improve the representation of tweets and as a consequence the performance of the clustering company tweets task. The obtained results are promising and highlight the difficulty of this task.

    Original languageEnglish
    Title of host publicationProceedings of the 2nd International Workshop on Search and Mining User-Generated Contents, SMUC'10, Co-located with 19th International Conference on Information and Knowledge Management, CIKM'10
    Pages95-102
    Number of pages8
    DOIs
    Publication statusPublished - 2010
    Event2nd International Workshop on Search and Mining User-Generated Contents, SMUC'10, Co-located with 19th International Conference on Information and Knowledge Management, CIKM'10 - Toronto, ON, Canada
    Duration: 26 Oct 201030 Oct 2010

    Publication series

    NameInternational Conference on Information and Knowledge Management, Proceedings

    Conference

    Conference2nd International Workshop on Search and Mining User-Generated Contents, SMUC'10, Co-located with 19th International Conference on Information and Knowledge Management, CIKM'10
    Country/TerritoryCanada
    CityToronto, ON
    Period26/10/1030/10/10

    Keywords

    • Clustering of tweets
    • Opinion analysis

    Fingerprint

    Dive into the research topics of 'On the difficulty of clustering company tweets'. Together they form a unique fingerprint.

    Cite this