TY - JOUR
T1 - Know an Emotion by the Company It Keeps
T2 - Word Embeddings from Reddit/Coronavirus
AU - García-Rudolph, Alejandro
AU - Sanchez-Pinsach, David
AU - Frey, Dietmar
AU - Opisso, Eloy
AU - Cisek, Katryna
AU - Kelleher, John D.
N1 - Publisher Copyright:
© 2023 by the authors.
PY - 2023/6
Y1 - 2023/6
N2 - Social media is a crucial communication tool (e.g., with 430 million monthly active users in online forums such as Reddit), being an objective of Natural Language Processing (NLP) techniques. One of them (word embeddings) is based on the quotation, “You shall know a word by the company it keeps,” highlighting the importance of context in NLP. Meanwhile, “Context is everything in Emotion Research.” Therefore, we aimed to train a model (W2V) for generating word associations (also known as embeddings) using a popular Coronavirus Reddit forum, validate them using public evidence and apply them to the discovery of context for specific emotions previously reported as related to psychological resilience. We used Pushshiftr, quanteda, broom, wordVectors, and superheat R packages. We collected all 374,421 posts submitted by 104,351 users to Reddit/Coronavirus forum between January 2020 and July 2021. W2V identified 64 terms representing the context for seven positive emotions (gratitude, compassion, love, relief, hope, calm, and admiration) and 52 terms for seven negative emotions (anger, loneliness, boredom, fear, anxiety, confusion, sadness) all from valid experienced situations. We clustered them visually, highlighting contextual similarity. Although trained on a “small” dataset, W2V can be used for context discovery to expand on concepts such as psychological resilience.
AB - Social media is a crucial communication tool (e.g., with 430 million monthly active users in online forums such as Reddit), being an objective of Natural Language Processing (NLP) techniques. One of them (word embeddings) is based on the quotation, “You shall know a word by the company it keeps,” highlighting the importance of context in NLP. Meanwhile, “Context is everything in Emotion Research.” Therefore, we aimed to train a model (W2V) for generating word associations (also known as embeddings) using a popular Coronavirus Reddit forum, validate them using public evidence and apply them to the discovery of context for specific emotions previously reported as related to psychological resilience. We used Pushshiftr, quanteda, broom, wordVectors, and superheat R packages. We collected all 374,421 posts submitted by 104,351 users to Reddit/Coronavirus forum between January 2020 and July 2021. W2V identified 64 terms representing the context for seven positive emotions (gratitude, compassion, love, relief, hope, calm, and admiration) and 52 terms for seven negative emotions (anger, loneliness, boredom, fear, anxiety, confusion, sadness) all from valid experienced situations. We clustered them visually, highlighting contextual similarity. Although trained on a “small” dataset, W2V can be used for context discovery to expand on concepts such as psychological resilience.
KW - COVID-19
KW - emotions
KW - natural language processing
KW - Reddit
KW - resilience
KW - social media
UR - http://www.scopus.com/inward/record.url?scp=85161588764&partnerID=8YFLogxK
U2 - 10.3390/app13116713
DO - 10.3390/app13116713
M3 - Article
AN - SCOPUS:85161588764
SN - 2076-3417
VL - 13
JO - Applied Sciences (Switzerland)
JF - Applied Sciences (Switzerland)
IS - 11
M1 - 6713
ER -