Naïve Bayes and K-Nearest in Grouping Biomedical Literature
Abstract
Textual information gives us more clear information as it is presented using words and characters, which is easy for humans to
understand. To extract this kind of information, text mining has come into the new sight of technology. Text mining is the process of extracting
non-trivial patterns or knowledge from text documents or from textual databases. The purpose of this research paper is to perform and compare
keyword extraction using statistical and linguistic extraction tools for 120 text documents related to hypertension and diabetes disease. In order
to draw this comparison, RStudio and Fivefilters which is a statistical-based tool and TerMine and Flexiterm tool which is a linguistic-based
tool have been used to demonstrate the process of extracting the specified keyword from the biomedical literature. Thus, classification evaluation
using K-Nearest classifier is carried out in order to evaluate and compare the performance of the statistical and linguistic approach using the
tools. Experimental results show the comparison and the difference between both tools in executing extraction keywords.