Research On Missing Data Imputation Methods On Gene Expression

Fadilah Badari (1), Zuraini Ali Shah (2), RD Rohmat Saedudin (3), Shahreen Kasim (4), Seah Choon Sen (5)
(1)
(2)
(3)
(4)
(5)
Fulltext View | Download
How to cite (IJASEIT) :
Citation Format :

Microarray technologies allows for the monitoring expression levels of thousands of genes under a variety of condition.


Gene expression data are accurate mostly but still contains error within its data set, as the microarray data obtained has many


missing values. The result of microarray experiment consists of data sets with form of large of expression levels of genes as rows and


under different experimental condition as columns and frequently with some value missing. The missing value presence can affect the


result for visualization analysis of gene expression. This brings need to various machine learning methods implementation for this


missing value problem by imputing values into the microarray. Imputation method include the replacement of missing values with


estimated based on several information that originated from set of data. In this research, K-nearest Neighbour, Local Least Square,


Bayesian Principal Component Analysis, mean and median imputation method are used for missing value imputation. The result


from the implementation of imputation method is analyzed for its performance by using two different types of classifiers that is


support vector machine and neural network classification. From the result analysis, imputation technique using K-nearest Neighbour


with highest accuracy value using SVM is 0.9146 and Local Least Square with accuracy value 0.8445 has proven better result in ANN.


SVM have better accuracy compared to ANN after imputation.