Skip to main navigation Skip to search Skip to main content

A methodology to cluster informal language register data

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    Abstract

    Analyzing and classifying web content is a task that has been attracting an increasing amount of interest in recent years. However there are additional challenges to face with user generated content emanating from Web 2.0 applications such as blogs. commentaries, reviews etc. The typical characteristics of this information include features such as shortness, overlapping vocabulary, and vocabulary size and nature that make it difficult to achieve good results using automated clustering processes. The Web 2.0 informal written register introduces further challenges, containing incomplete sentences, misspellings, spontaneous structures etc. These characteristics make it difficult to select or employ external resources to improve the clustering performance, hi this work we apply a methodology that does not rely on any external resources in order to automatically cluster this data. This approach improves the representation of informal language register data by using a term enriching procedure and also uses a term selection technique to identify the most important and discriminative information .Our results show that this technique can produce significant improvements in the quality of clusters produced.

    Original languageEnglish
    Title of host publicationProceedings of the 4th Indian International Conference on Artificial Intelligence, IICAI 2009
    Pages1391-1401
    Number of pages11
    Publication statusPublished - 2009
    Event4th Indian International Conference on Artificial Intelligence, IICAI 2009 - Tumkur, India
    Duration: 16 Dec 200918 Dec 2009

    Publication series

    NameProceedings of the 4th Indian International Conference on Artificial Intelligence, IICAI 2009

    Conference

    Conference4th Indian International Conference on Artificial Intelligence, IICAI 2009
    Country/TerritoryIndia
    CityTumkur
    Period16/12/0918/12/09

    Keywords

    • Blogs
    • Clustering
    • Web 2.0

    Fingerprint

    Dive into the research topics of 'A methodology to cluster informal language register data'. Together they form a unique fingerprint.

    Cite this