Vietnamese character recognition based on CNN model with reduced character classes

Thi Ha Phan, Duc Chung Tran, Mohd Fadzil Hassan

Abstract


This article will detail the steps to build and train the convolutional neural network (CNN) model for Vietnamese character recognition in educational books. Based on this model, a mobile application for extracting text content from images in Vietnamese textbooks was built using OpenCV and Canny edge detection algorithm. There are 178 characters classes in Vietnamese with accents. However, within the scope of Vietnamese character recognition in textbooks, some classes of characters only differ in terms of actual sizes, such as “c” and “C”, “o” and “O”. Therefore, the authors built the classification model for 138 Vietnamese character classes after filtering out similar character classes to increase the model's effectiveness.

Keywords


Character; CNN; Deep learning; Recognition

Full Text:

PDF


DOI: https://doi.org/10.11591/eei.v10i2.2810

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Bulletin of EEI Stats

Bulletin of Electrical Engineering and Informatics (BEEI)
ISSN: 2089-3191, e-ISSN: 2302-9285
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).