Extractive Generic-Based Summarization of Multiple Biomedical Documents Using Hybrid TF-IDF Algorithm and Cosine Similarity Method
How to cite (IJASEIT) :
Extractive generic-based multi-document summarization is a process of summarizing multiple documents with similar topic by extracting the important information and generating a generic summary which preserve the overall content of the documents. Nowadays, the number of biomedical documents available on Web is growing rapidly due to the active research carried out by the researchers and this causes the problem of information overload and time consuming for biomedical experts to study the long and multiple related documents. Therefore, multi-document summarization has become important to solve the problem of information overload and to save the time for biomedical experts to read and understand multiple related documents. However, there are issues arise in the summarization of multi-document which are diverse and redundant information from multiple related documents and lack of cohesion in a summary. In this study, the proposed method to address the issues is performing multi-document summarization using hybrid TF-IDF algorithm and cosine similarity method. The compression ratio of 30%, 50%, 70% and 90% is applied to generate the system summary of 10, 40, 50 and 100 documents for the proposed method. The results of this proposed method are evaluated by using ROUGE-measure and T-test. For the results evaluated by using ROUGE-1, it shows that the performance of summarization with and without redundancy which determined by the ROUGE-1 F-measure is affected by the compression ratio and number of input documents. In the case of large number of input documents, the summary without redundancy has the best F-measure at lower compression ratio whereas the summary with redundancy obtains the best F-measure at higher compression ratio. In addition, for the results evaluated by T-test, it shows that the summary sentences cohesion degree for the proposed method is increasing significantly compared to the multi-document summarization using only TF-IDF algorithm.