A Comparison Between Stacked Denoising Autoencoder (SDAE) with Long Short-Term Memory (LSTM) for Cancer Classification
Abstract
Cancer is the serious type of disease as genetic modification can lead to cancer. Cancer disease can be controlled from spread if early diagnosed is make. The main problem to do cancer classification is because of the large data in dataset. This data will be meaningless if the dataset contains many unclean data. Unclean data meant that the data might be null data, redundant or else. Therefore, the result produce will be not much accurate, and the complexity of the algorithm can be increased. In this research, multi-omics dataset will be used. To get multi-omics dataset, the data need to undergo some pre-process phase to make sure that the dataset produce is clean. The dataset will be integrating by using python coding. This will help to combine the omics data from different file. The data also will undergo feature selection to select attribute or features from the complete set of data. The method that will be used for feature selection is SVM-RFE. SVM-RFE was proven as the best feature selection method to ranks the features or gene by training support vector machine classification model and choose lead genes feature with RFE strategy [1]. After the data have clean it will be implemented in stacked denoising autoencoder (SDAE) and recurrent neural network (RNN). Lastly, the performance of both methods will be compared to find out which method is the better in classifying cancer data using multi-omics dataset.