Named Entity Recognition in Biomedical Documents using Recurrent Neural Network


  • Diong Lee Juen
  • Sharin Hazlin Huspi


Named entity recognition is an information extraction task that detect and classify named entity mentions in free text into predefined categories. It will help to solve more complex text mining tasks such as information retrieval, question answering and text summarization. However, current research has been done on this task using available dataset or dataset annotated by medical experts. It is difficult to obtain the manually annotated dataset by medical experts especially for a new dataset. Application of available biomedical resources is very important to overcome this problem. Thus, this paper will specify in named entity recognition on PubMed dataset specifically for hypertension disease which is annotated through application of biomedical resources. GENIA Tagger is used to perform tokenization of the biomedical abstracts and MetaMapLite is used to perform semantic annotation. After that, the terms or phrases are annotated into the BIO format. Bidirectional LSTM-CRF, an example of Recurrent Neural Network that showed promising results for named entity recognition, will be applied to perform named entity recognition for the research dataset. The experiment setting of 500 abstracts, 32 batch sizes and 25 epochs presented the best results for precision, recall and F1-score which are 0.79, 0.77 and 0.78, respectively. The results showed that the research dataset achieved almost the same results for precision, recall and F1-score as shown in the previous study using dataset which was manually annotated by medical experts.