TY - GEN
T1 - Classifiers for Yelp-Reviews Based on GMDH-Algorithms
AU - Alexandrov, Mikhail
AU - Skitalinskaya, Gabriella
AU - Cardiff, John
AU - Koshulko, Olexiy
AU - Shushkevich, Elena
N1 - Publisher Copyright:
© 2023, Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - Yelp is one of the most popular international web resources about products and services that provide users with useful information on local businesses and helps the business owners to make their business more attractive for the users. The Yelp dataset consists of attributes for describing the business, reviews in free text form and numeric star ratings out of 5. The utility of such a dataset has provoked dozens of publications related to classifiers of ratings, which used various smart tools of opinion mining. Unlike them, in this paper we propose to use simpler approaches, namely: (a) selection of descriptors based on term specificity, and (b) formation of classifiers with these descriptors based on inductive modeling. The latter is implemented by the well-known tool GMDH Shell, where GMDH stands for Group Method of Data Handling. This method allows us to build models with high noise immunity. We compare 96 prediction models with identified descriptors by combining various variants: (i) preprocessing with data transformation and balancing classes, (ii) algorithms of classification; and (iii) post processing with ensembling. Instead of the typical 5- star classification we consider combined classes reflecting a more practical view on purchase of goods or development of business. The experiments refer to the most popular categories of business: restaurants and shopping. To evaluate the quality of classifiers we consider the results of predecessors, and we also introduce the so-called defensible accuracy. With this comparison the results presented in the paper prove to be promising.
AB - Yelp is one of the most popular international web resources about products and services that provide users with useful information on local businesses and helps the business owners to make their business more attractive for the users. The Yelp dataset consists of attributes for describing the business, reviews in free text form and numeric star ratings out of 5. The utility of such a dataset has provoked dozens of publications related to classifiers of ratings, which used various smart tools of opinion mining. Unlike them, in this paper we propose to use simpler approaches, namely: (a) selection of descriptors based on term specificity, and (b) formation of classifiers with these descriptors based on inductive modeling. The latter is implemented by the well-known tool GMDH Shell, where GMDH stands for Group Method of Data Handling. This method allows us to build models with high noise immunity. We compare 96 prediction models with identified descriptors by combining various variants: (i) preprocessing with data transformation and balancing classes, (ii) algorithms of classification; and (iii) post processing with ensembling. Instead of the typical 5- star classification we consider combined classes reflecting a more practical view on purchase of goods or development of business. The experiments refer to the most popular categories of business: restaurants and shopping. To evaluate the quality of classifiers we consider the results of predecessors, and we also introduce the so-called defensible accuracy. With this comparison the results presented in the paper prove to be promising.
KW - GMDH
KW - GMDH shell
KW - Opinion mining
KW - Text mining
KW - Yelp
UR - https://www.scopus.com/pages/publications/85149935623
U2 - 10.1007/978-3-031-23804-8_32
DO - 10.1007/978-3-031-23804-8_32
M3 - Conference contribution
AN - SCOPUS:85149935623
SN - 9783031238031
T3 - Lecture Notes in Computer Science
SP - 412
EP - 430
BT - Computational Linguistics and Intelligent Text Processing - 19th International Conference, CICLing 2018, Revised Selected Papers
A2 - Gelbukh, Alexander
PB - Springer Science and Business Media Deutschland GmbH
T2 - 19th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2018
Y2 - 18 March 2018 through 24 March 2018
ER -