Advanced content-based retrieval for digital correspondence documents with ontology classification

Rifiana Arief, Suryarini Widodo, Ary Bima Kurniawan, Hustinawaty Hustinawaty, Faisal Arkan

Abstract


The growth of digital correspondence documents with various types, different naming rules, and no sufficient search system complicates the search process with certain content, especially if there are unclassified documents, the search becomes inaccurate and takes a long time. This research proposed archiving method with automatic hierarchical classification and the content-based search method which displays ontology classification information as the solution to the content-based search problems. The method consists of preprocessing (creation of automatic hierarchical classification model using a combination of convolutional neural network (CNN) and regular expression method), archiving (document archiving with automatic classification), and retrieval (content-based search by displaying ontology relationships from the document classification). The archiving of 100 documents using the automatic hierarchical classification was found to be 79% accurate as indicated by the 99% accuracy for CNN and 80% for Regex. Moreover, the search results for classified content-based documents through the display of ontology relationships were discovered to be 100% accurate. This research succeeded in improving the quality of search results for digital correspondence documents as indicated by its higher specificity, accuracy, and speed compared to conventional methods based on file names, annotations, and unclassified content.

Keywords


Classification; Content; Document; Ontology; Retrieval

Full Text:

PDF


DOI: https://doi.org/10.11591/eei.v11i3.3376

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Bulletin of EEI Stats