COMPARATIVE ANALYSIS OF CLASSIFICATION BASED ON CELLULAR LOCALIZATION DATA USING MACHINE LEARNING

Authors

  • Rohayanti Hassan
  • Muhammad Luqman Mohd Shafie
  • Alif Ridzuan Khairuddin

Abstract

Due to the pandemic caused by Covid-19, vaccine development has been a hot issue to be discussed and a lot of research was conducted to create a vaccine that is efficient in fighting against viral infection. Therefore, protein subcellular localization is one of the methods that are suitable to be used in studies of vaccine development. By recent technology, the protein subcellular localization is only able to handle single compartment prediction but in reality, there are multiple compartment predictions that need to be done in order to give an accurate prediction. Previously, we used DM3Loc pre-existing tools that were used to generate subcellular localization data from the FASTA Sequences to get the concentration of the viral protein inside the cell. Based on the result, we can conclude that the selected protein is highly possible to reside within the cells. For DM3Loc, we use CNN which is a Convolutional Neural Network as a framework. But what if we try to reverse-engineer the tools by using another machine learning model such as Decision Tree, Random Forest or Support Vector Machine? Is it still able to produce accurate prediction results? The dataset that will be used in this research was obtained from an online database and ran through DM3Loc to obtain the Subcellular Localization Dataset. Based on the findings, other machine learning methods can probably be another option than CNN for the future of subcellular localization.

Downloads

Issue

Section

Articles