Primal-dual for classification with rejection (PD-CR): a novel method for classification and feature selection—an application in metabolomics studies

Creators: Chardin, David; Humbert, Olivier; Bailleux, Caroline; Burel-Vandenbos, Fanny; Rigau, Valerie; Pourcher, Thierry; Barlaud, Michel

Others:: Centre de Lutte contre le Cancer Antoine Lacassagne [Nice] (UNICANCER/CAL) ; UNICANCER-Université Côte d'Azur (UCA); UMR E4320 (TIRO-MATOs) ; Université Nice Sophia Antipolis (1965 - 2019) (UNS) ; COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Côte d'Azur (UCA); ANR-19-P3IA-0002,3IA@cote d'azur,3IA Côte d'Azur(2019)

Description

Abstract Background Supervised classification methods have been used for many years for feature selection in metabolomics and other omics studies. We developed a novel primal-dual based classification method (PD-CR) that can perform classification with rejection and feature selection on high dimensional datasets. PD-CR projects data onto a low dimension space and performs classification by minimizing an appropriate quadratic cost. It simultaneously optimizes the selected features and the prediction accuracy with a new tailored, constrained primal-dual method. The primal-dual framework is general enough to encompass various robust losses and to allow for convergence analysis. Here, we compare PD-CR to three commonly used methods: partial least squares discriminant analysis (PLS-DA), random forests and support vector machines (SVM). We analyzed two metabolomics datasets: one urinary metabolomics dataset concerning lung cancer patients and healthy controls; and a metabolomics dataset obtained from frozen glial tumor samples with mutated isocitrate dehydrogenase (IDH) or wild-type IDH. Results PD-CR was more accurate than PLS-DA, Random Forests and SVM for classification using the 2 metabolomics datasets. It also selected biologically relevant metabolites. PD-CR has the advantage of providing a confidence score for each prediction, which can be used to perform classification with rejection. This substantially reduces the False Discovery Rate. Conclusion PD-CR is an accurate method for classification of metabolomics datasets which can outperform PLS-DA, Random Forests and SVM while selecting biologically relevant features. Furthermore the confidence score provided with PD-CR can be used to perform classification with rejection and reduce the false discovery rate.

Abstract

International audience

Primal-dual for classification with rejection (PD-CR): a novel method for classification and feature selection—an application in metabolomics studies

Description

Abstract

Additional details