TY - GEN
T1 - An Ontological Approach for Recommending a Feature Selection Algorithm
AU - Nayak, Aparna
AU - Božić, Bojan
AU - Longo, Luca
N1 - Publisher Copyright:
© 2022, Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - Feature selection plays an important role in machine learning or data mining problems. Removing irrelevant features increases model accuracy and reduces the computational cost. However, selecting important features is not a simple task as one feature selection algorithm does not perform well on all the datasets that are of interest. This paper tries to address the recommendation of a feature selection algorithm based on dataset characteristics and quality. The research uses three types of dataset characteristics along with data quality metrics. The main contribution of the work is the utilization of Semantic Web techniques to develop a novel system that can aid in robust feature selection algorithm recommendations. The system’s strength lies in assisting users of machine learning algorithms by providing more relevant feature selection algorithms for the dataset using an ontology called Feature Selection algorithm recommendation based on Data Characteristics and Quality (FSDCQ). Results are generated using six different feature selection algorithms and four types of classifiers on ten datasets from UCI repository. Recommendations take the form of “Feature selection algorithm X is recommended for dataset i, as it performed better on dataset j, similar to dataset i in terms of class overlap 0.3, label noise 0.2, completeness 0.9, conciseness 0.8 units". While the domain-specific ontology FSDCQ was created to aid in the task of algorithm recommendation for feature selection, it is easily applicable to other meta-learning scenarios.
AB - Feature selection plays an important role in machine learning or data mining problems. Removing irrelevant features increases model accuracy and reduces the computational cost. However, selecting important features is not a simple task as one feature selection algorithm does not perform well on all the datasets that are of interest. This paper tries to address the recommendation of a feature selection algorithm based on dataset characteristics and quality. The research uses three types of dataset characteristics along with data quality metrics. The main contribution of the work is the utilization of Semantic Web techniques to develop a novel system that can aid in robust feature selection algorithm recommendations. The system’s strength lies in assisting users of machine learning algorithms by providing more relevant feature selection algorithms for the dataset using an ontology called Feature Selection algorithm recommendation based on Data Characteristics and Quality (FSDCQ). Results are generated using six different feature selection algorithms and four types of classifiers on ten datasets from UCI repository. Recommendations take the form of “Feature selection algorithm X is recommended for dataset i, as it performed better on dataset j, similar to dataset i in terms of class overlap 0.3, label noise 0.2, completeness 0.9, conciseness 0.8 units". While the domain-specific ontology FSDCQ was created to aid in the task of algorithm recommendation for feature selection, it is easily applicable to other meta-learning scenarios.
KW - Feature selection algorithms
KW - Meta-features
KW - Ontology
UR - http://www.scopus.com/inward/record.url?scp=85135046964&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-09917-5_20
DO - 10.1007/978-3-031-09917-5_20
M3 - Conference contribution
AN - SCOPUS:85135046964
SN - 9783031099168
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 300
EP - 314
BT - Web Engineering - 22nd International Conference, ICWE 2022, Proceedings
A2 - Di Noia, Tommaso
A2 - Ko, In-Young
A2 - Schedl, Markus
A2 - Ardito, Carmelo
PB - Springer Science and Business Media Deutschland GmbH
T2 - 22nd International Conference on Web Engineering, ICWE 2022
Y2 - 5 July 2022 through 8 July 2022
ER -