Experimental Result of Feature Selections with PCA on KNN, LDA and SVM Classification
Abstract
Nowadays, cancer classification has used advanced technology such as microarray technology to conduct a research. Microarray is a technology that allows us to measured thousands of genes simultaneously. This technology also have successfully applied in many problems, for example in medical science. Microarray also has shown it ability to diagnose a patient that have specific disease. Thus, this technology used to detect a disease such as cancer, which usually have a binary class. The major drawback in terms of classification of this disease is, the gene expression data produced by microarray have high dimension. To counter this problems, an important genes should be identify and reduce the dimensionality of the microarray data. In this research, six feature selections (Receiver Operating Characteristic curve, Wilcoxon rank sum test, t-statistic, Kruskal-Wallis test statistic, Fisher score, and Gini index) has been used with the combination of Principal Component Analysis (feature extraction) to solve the high dimension problem and produce a new subset of original datasets. Then, the new dataset is classified according to their class. Three classifications (K-Nearest Neighbour, Linear Discriminant Analysis, and Support Vector Machine) are used in this research and the performance of each classifier are calculated and compared. The experimental result shows that, among the feature selections, both Wilcoxon rank sum test with Principal Component Analysis for Linear Discriminant Analysis classifier and Receiver Operating Characteristic curve with Principal Component Analysis for Support Vector Machine classifier shows highest correct rate with 96% which outperformed other feature selections.