Feature-based POS tagging and sentence relevance for news multi-document summarization in Bahasa Indonesia

Moch Zawaruddin Abdullah, Chastine Fatichah

Abstract


Sentence extraction in news document summarization determines representative sentences primarily by employing the news feature known as news feature score (NeFS). NeFS can achieve meaningful sentences by analyzing the frequency and similarity of phrases while neglecting grammatical information and sentence relevance to the title. The presence of instructive content is indicated by grammatical information carried by part of speech (POS). POS tagging is the process of giving a meaningful tag to each term based on qualified data and even surrounding words. Sentence relevance to the title is intended to determine the sentence's level of connectivity to the title in terms of both word-based and meaning-based similarity, primarily for news documents in Bahasa Indonesia. In this study, we present an alternative sentence weighting method by incorporating news features, POS tagging, and sentence relevance to the title. Sentence extraction based on news features, POS tagging, and sentence relevance is introduced to extract the representative sentences. The experiment results on the 11 groups of Indonesian news documents are compared with the news features scores with the grammatical information approach method (NeFGIS). The proposed method achieved better results. The increasing f-score rate of ROUGE-1, ROUGE-2, ROUGE-L, and ROUGE-SU4 sequentially are 1.84%, 3.03%, 3.85%, 2.08%.

Keywords


Indonesian news; Multi-document summarization; News features; Pos tagging; Sentence extraction; Sentence relevance

Full Text:

PDF


DOI: https://doi.org/10.11591/eei.v11i1.3275

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Bulletin of EEI Stats