Published May 27, 2022
| Version v1
Publication
Graph coloring for extracting discriminative genes in cancer data
Description
Background and objective: The major difficulty of the analysis of the input gene
expression data in a microarray-based approach for an automated diagnosis of can cer is the large number of genes (high dimensionality) with many irrelevant genes
(noise) compared to the very small number of samples. This research study tackles
the dimensionality reduction challenge in this area.
Methods: This research study introduces a dimension-reduction technique termed
graph coloring approach (GCA) for microarray data-based cancer classification based
on analyzing the absolute correlation between gene–gene pairs and partitioning genes
into several hubs using graph coloring. GCA starts by a gene-selection step in which
top relevant genes are selected using a biserial correlation. Each time, a gene from
an ordered list of top relevant genes is selected as the hub gene (representative) and
redundant genes are added to its group; the process is repeated recursively for the
remaining genes. A gene is considered redundant if its absolute correlation with the
hub gene is greater than a controlling threshold. A suitable range for the threshold
is estimated by computing a percentage graph for the absolute correlation between
gene–gene pairs. Each value in the estimated range for the threshold can efficiently
produce a new feature subset.
Results: GCA achieved significant improvement over several existing techniques in
terms of higher accuracy and a smaller number of features. Also, genes selected
by this method are relevant genes according to the information stored in scientific
repositories.
Conclusions: The proposed dimension-reduction technique can help biologists accu rately predict cancer in several areas of the body.
Additional details
Identifiers
- URL
- https://idus.us.es/handle//11441/133784
- URN
- urn:oai:idus.us.es:11441/133784
Origin repository
- Origin repository
- USE