Skip to main navigation Skip to search Skip to main content

Multilabel Text Classification of Unbalanced Datasets: Two-Pass NNMF

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The natural distribution of textual data used in text classification is often imbalanced. Categories with fewer examples are under-represented and their classifiers trained on the datasets transformed to bag-of-words representations or basic topic modeling transformations often perform far below a satisfactory level. We tackle this problem using a two-pass non-negative matrix factorization algorithm. This approach finds topics for each category independently allowing to better define topics for underrepresented categories. The results are analyzed from multiple goal perspectives - H-loss, accuracy, F-measure, precision, and recall, from the micro, macro and example-based aspect since each is appropriate in different situations. Through experimental validation, it is shown that the two-pass matrix factorization improves classification results achieved using bag-of-words representations.

Original languageEnglish
Title of host publicationComputational Linguistics and Intelligent Text Processing - 19th International Conference, CICLing 2018, Revised Selected Papers
EditorsAlexander Gelbukh
PublisherSpringer Science and Business Media Deutschland GmbH
Pages275-286
Number of pages12
ISBN (Print)9783031238031
DOIs
Publication statusPublished - 2023
Event19th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2018 - Hanoi, Viet Nam
Duration: 18 Mar 201824 Mar 2018

Publication series

NameLecture Notes in Computer Science
Volume13397 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference19th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2018
Country/TerritoryViet Nam
CityHanoi
Period18/03/1824/03/18

Keywords

  • Matrix decomposition
  • Multi-label text classification
  • Topic modeling

Fingerprint

Dive into the research topics of 'Multilabel Text Classification of Unbalanced Datasets: Two-Pass NNMF'. Together they form a unique fingerprint.

Cite this