Authors

Abstract

Textual information gives us more clear information as it is presented using words and characters, which is easy for humans to

understand. To extract this kind of information, text mining has come into the new sight of technology. Text mining is the process of extracting

non-trivial patterns or knowledge from text documents or from textual databases. The purpose of this research paper is to perform and compare

keyword extraction using statistical and linguistic extraction tools for 120 text documents related to hypertension and diabetes disease. In order

to draw this comparison, RStudio and Fivefilters which is a statistical-based tool and TerMine and Flexiterm tool which is a linguistic-based

tool have been used to demonstrate the process of extracting the specified keyword from the biomedical literature. Thus, classification evaluation

using K-Nearest classifier is carried out in order to evaluate and compare the performance of the statistical and linguistic approach using the

tools. Experimental results show the comparison and the difference between both tools in executing extraction keywords.