Research On Missing Data Imputation Methods On Gene Expression

Authors

  • Fadilah Badari
  • Zuraini Ali Shah
  • RD Rohmat Saedudin
  • Shahreen Kasim
  • Seah Choon Sen

Abstract

Microarray technologies allows for the monitoring expression levels of thousands of genes under a variety of condition.

Gene expression data are accurate mostly but still contains error within its data set, as the microarray data obtained has many

missing values. The result of microarray experiment consists of data sets with form of large of expression levels of genes as rows and

under different experimental condition as columns and frequently with some value missing. The missing value presence can affect the

result for visualization analysis of gene expression. This brings need to various machine learning methods implementation for this

missing value problem by imputing values into the microarray. Imputation method include the replacement of missing values with

estimated based on several information that originated from set of data. In this research, K-nearest Neighbour, Local Least Square,

Bayesian Principal Component Analysis, mean and median imputation method are used for missing value imputation. The result

from the implementation of imputation method is analyzed for its performance by using two different types of classifiers that is

support vector machine and neural network classification. From the result analysis, imputation technique using K-nearest Neighbour

with highest accuracy value using SVM is 0.9146 and Local Least Square with accuracy value 0.8445 has proven better result in ANN.

SVM have better accuracy compared to ANN after imputation.

Downloads

Issue

Section

Articles