LightGWAS: A novel machine learning procedure for genome-wide association study

Bruno Ambrozio, Luca Longo, Lucas Rizzo

Research output: Contribution to journalConference articlepeer-review

Abstract

This paper proposes a novel machine learning procedure for genome-wide association study (GWAS), named LightGWAS. It is based on the LightGBM framework, in addition to being a single, resilient, autonomous and scalable solution to address common limitations of GWAS implementations found in the literature. These include reliance on massive manual quality control steps and specific GWAS methods for each type of dataset morphology and size. Through this research, LightGWAS has been contrasted against PLINK2, one of the current state-of-the-art for GWAS implementations based on general linear model with support to firth regularisation. The mean differences measured upon standard classification metrics, extracted via quantitative empirical tests through k-fold cross-validation technique, indicated that LightGWAS outperforms PLINK2 for balanced, imbalanced, and high-imbalanced genomic datasets. Paired difference tests denoted statistical significance in the results extracted from the experiments with imbalanced datasets. This article contributes to the body of knowledge by presenting a potentially more efficient GWAS procedure based on nonparametric approaches. LightGWAS ensures adaptability with higher precision in the discovery of causal single-nucleotide polymorphisms, thanks to the leaf-wise tree growth algorithm offered by the state-of-the-art for gradient boosting decision trees. Control for false-positives and statistical power are automatically addressed by the model's training process, which significative reduces human dependency during the study design.

Original languageEnglish
Pages (from-to)25-36
Number of pages12
JournalCEUR Workshop Proceedings
Volume2771
Publication statusPublished - 2020
Event28th Irish Conference on Artificial Intelligence and Cognitive Science, AICS 2020 - Dublin, Ireland
Duration: 7 Dec 20208 Dec 2020

Keywords

  • Genome-wide association study
  • LightGBM
  • LightGWAS

Fingerprint

Dive into the research topics of 'LightGWAS: A novel machine learning procedure for genome-wide association study'. Together they form a unique fingerprint.

Cite this