TY - GEN
T1 - Multi-Explainable TemporalNet
T2 - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024
AU - Zafar, Anas
AU - Aftab, Danyal
AU - Qureshi, Rizwan
AU - Wang, Yaofeng
AU - Yan, Hong
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Multimodal depression detection through internet-based data such as social media platforms has been an important problem in the research community, aiming to predict human mental states for ensuring wellbeing of the society. Recently, attention-based networks have gained significant popularity for depression detection. However, existing multimodal methods primarily rely on images and text assuming no correlation between temporal aspects such as relative time of different posts or tweets, which is a crucial factor in deriving depression related behavior patterns. Moreover, they lack model interpretability resulting in limited understanding of how different features are contributing to the model's final prediction. In this paper, we propose MultiExplainable TemporalNet (METN), a Temporal Convolution Network (TCN) based multi-modal transformer network with relative timestamp embeddings. We leverage pretrained foundation models for text and image embeddings and attention maps for model interpretability. We perform extensive experiments and ablation studies to validate the performance of METN for user-level depression detection task. Our model shows state-of-the-art results on various benchmarks, such as 0.945 F1 score on multimodal Twitter dataset, and 0.913 F1 score on multimodal Reddit dataset. We further demonstrate that our model enhances the accuracy of identifying depression in individuals who publicly post messages on social media platforms with enhanced interpretable compatibility. Code and models are available at Github.
AB - Multimodal depression detection through internet-based data such as social media platforms has been an important problem in the research community, aiming to predict human mental states for ensuring wellbeing of the society. Recently, attention-based networks have gained significant popularity for depression detection. However, existing multimodal methods primarily rely on images and text assuming no correlation between temporal aspects such as relative time of different posts or tweets, which is a crucial factor in deriving depression related behavior patterns. Moreover, they lack model interpretability resulting in limited understanding of how different features are contributing to the model's final prediction. In this paper, we propose MultiExplainable TemporalNet (METN), a Temporal Convolution Network (TCN) based multi-modal transformer network with relative timestamp embeddings. We leverage pretrained foundation models for text and image embeddings and attention maps for model interpretability. We perform extensive experiments and ablation studies to validate the performance of METN for user-level depression detection task. Our model shows state-of-the-art results on various benchmarks, such as 0.945 F1 score on multimodal Twitter dataset, and 0.913 F1 score on multimodal Reddit dataset. We further demonstrate that our model enhances the accuracy of identifying depression in individuals who publicly post messages on social media platforms with enhanced interpretable compatibility. Code and models are available at Github.
UR - https://www.scopus.com/pages/publications/85206479409
U2 - 10.1109/CVPRW63382.2024.00231
DO - 10.1109/CVPRW63382.2024.00231
M3 - Conference contribution
AN - SCOPUS:85206479409
T3 - IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
SP - 2258
EP - 2265
BT - Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024
PB - IEEE Computer Society
Y2 - 16 June 2024 through 22 June 2024
ER -