TY - JOUR
T1 - Emo, love and god
T2 - Making sense of urban dictionary, a crowd-sourced online dictionary
AU - Nguyen, Dong
AU - McGillivray, Barbara
AU - Yasseri, Taha
N1 - Publisher Copyright:
© 2018 The Authors.
PY - 2018/5/2
Y1 - 2018/5/2
N2 - The Internet facilitates large-scale collaborative projects and the emergence of Web 2.0 platforms, where producers and consumers of content unify, has drastically changed the information market. On the one hand, the promise of the ‘wisdom of the crowd’ has inspired successful projects such as Wikipedia, which has become the primary source of crowd-based information in many languages. On the other hand, the decentralized and often unmonitored environment of such projects may make them susceptible to low-quality content. In this work, we focus on Urban Dictionary, a crowdsourced online dictionary.We combine computational methods with qualitative annotation and shed light on the overall features of Urban Dictionary in terms of growth, coverage and types of content. We measure a high presence of opinionfocused entries, as opposed to the meaning-focused entries that we expect from traditional dictionaries. Furthermore, Urban Dictionary covers many informal, unfamiliar words as well as proper nouns. Urban Dictionary also contains offensive content, but highly offensive content tends to receive lower scores through the dictionary’s voting system. The low threshold to include new material in Urban Dictionary enables quick recording of new words and new meanings, but the resulting heterogeneous content can pose challenges in using Urban Dictionary as a source to study language innovation.
AB - The Internet facilitates large-scale collaborative projects and the emergence of Web 2.0 platforms, where producers and consumers of content unify, has drastically changed the information market. On the one hand, the promise of the ‘wisdom of the crowd’ has inspired successful projects such as Wikipedia, which has become the primary source of crowd-based information in many languages. On the other hand, the decentralized and often unmonitored environment of such projects may make them susceptible to low-quality content. In this work, we focus on Urban Dictionary, a crowdsourced online dictionary.We combine computational methods with qualitative annotation and shed light on the overall features of Urban Dictionary in terms of growth, coverage and types of content. We measure a high presence of opinionfocused entries, as opposed to the meaning-focused entries that we expect from traditional dictionaries. Furthermore, Urban Dictionary covers many informal, unfamiliar words as well as proper nouns. Urban Dictionary also contains offensive content, but highly offensive content tends to receive lower scores through the dictionary’s voting system. The low threshold to include new material in Urban Dictionary enables quick recording of new words and new meanings, but the resulting heterogeneous content can pose challenges in using Urban Dictionary as a source to study language innovation.
KW - Computational sociolinguistics
KW - Human–computer interaction
KW - Linguistic innovation
KW - Natural language processing
UR - http://www.scopus.com/inward/record.url?scp=85046662038&partnerID=8YFLogxK
U2 - 10.1098/rsos.172320
DO - 10.1098/rsos.172320
M3 - Article
AN - SCOPUS:85046662038
SN - 2054-5703
VL - 5
JO - Royal Society Open Science
JF - Royal Society Open Science
IS - 5
M1 - 172320
ER -