Integrated Variational Autoencoder Model for Genome Transcription Analysis of Gene Expression Profiles Towards Prediction and Classification of Cardiomyopathy Disease

Main Article Content

Ms. T. Sangeetha
Dr. K. Manikandan
Dr. D. Victor Arokia Doss


Cardiomyopathy is one of important cause of chronic heart failure which makes heart muscle harder to pump blood to other part of the body which leads to high mortality rate. Hence it is becomes mandatory to diagnosis and predict the disease in order to prevent the person against heart failure. However manual analysis of the disease is highly complex and leads to poor prognosis. In order to alleviate those challenges and predict the disease in early stage, many risk assessment methods has been modeled using machine learning and deep learning paradigms using genome wide association studies. Especially Cardiomyopathy risk assessment through gene expression from microarray data provides excellent results. In this article, microarray data containing gene expression data are preprocessed using missing value imputation through factor analysis and normalization through Z score normalization. Preprocessed gene expression data is employed to dimensionality reduction process through feature extraction and feature selection technique. In this model, linear discriminant analysis is employed as feature extraction method to extract differentially expressed gene (transcription of the RNA molecules that coded and non-coded for protein) which is represented as mutation chromosomes. Those genes are employed to feature selection technique to extract the targeted genes (type of variant and its score at specified location in genome of DNA) with respect to protein synthesized value (gene protein value) or molecular value of the gene using ant colony optimization. Optimal target genes contains the mutated chromosomes is selected. Finally target genes is employed to unsupervised deep learning model entitled as Integrated variational Autoencoder model for Genome transcription Analysis. It classifies the target gene representing miRNA on comparison with core set of target genes extracted from the diseased patient of the mutated chromosomes related to Cardiomyopathy which is considered as ground truth data into various classes of cardiomyopathy disease as ischemic Cardiomyopathy, dilated Cardiomyopathy and neurofibromatosis. Experimental analysis of various classifier employed to classify the core set of genes into type of classes of Cardiomyopathy is carried out on interfering the results of the classifier on the cross fold validation. Performance evaluation of the architectures on the mentioned dataset is performed using performance measure.

Article Details