Comparison of Term Weighting Method To Classify Hypertension Documents

Nur Nabilah Atikah Muhammad Rasib (1), Rohayanti Hassan (2), Rd Rohmat Saedudin (3)
(1)
(2)
(3)
Fulltext View | Download
How to cite (IJASEIT) :
Citation Format :

Currently, text mining is broadly used in this research to facilitate the burden of certain parties or individual. There are several techniques used in text mining which is feature extracting, term weighting and also feature selection. However, this research will focus on term weighting methods in text mining which is C-value and term frequency. Basically, numbers of document will be collected and pre-processing as an initial process will be done before the text can be mined. After that, the term that had been extracted will be weighted using two method of term weighting stated earlier. Then, it need to undergo classification phase to analyze and evaluate the performance between C-value and term frequency method. In this research, the biomedical field have been chosen and were discussed on the various matters of current issues especially hypertension disease. Hypertension is known as a leading risk factor for the development to other riskier disease including coronary artery and stroke. There are many documents that focus on biomedical but not all of the information and knowledge are related to the topic discussed and also in unstructured documents. Thus, the documents need to be extracted using an appropriate tools and technique. The purposed of this research is to compare the method of term weighting in text mining and classify the factors of hypertension disease. The result of this study can determine which method is most suitable and effective to yielding documents vectors in the construction of SVM classifier. Practically, this study is important because it can help biologist and computational biologist to understand large amount of text and making it easy and fast in facilitate the information and knowledge in biomedical.