Generating synthetic strike-through handwritten text using generative adversarial networks

Dheemanth Urs Raje, Chethan Hasigala Krishanappa

Abstract


The evaluation of handwritten text documents is a significant research field in text analysis. Deep learning classifiers' effectiveness tends to decline when faced with variations in text style and the presence of strike-out text components. These strike-outs can occur at the character, word, paragraph, or page level. When these documents undergo optical character recognition (OCR) processing, they often yield inaccurate results. Data unavailability, imbalance poses a strong risk in this area. Hence usage of synthetic datasets for conducting research can solve the issue to an extent. Despite using traditional data augmentation techniques like rotation, position shifting, zooming, and shearing, the issue of imbalanced class distribution remained unchanged. In order to tackle this challenge this paper demonstrates the creation of a synthetic dataset comprising of strikethrough text using modified version of generative adversarial networks (GANs) that has an auxiliary network for text recognition. IAM and RIMES dataset are used as base for the GAN to generate synthetic images of handwritten text. A line segmentation technique is also implemented to create strike outs on random words selected from the synthetic image set thereby increasing the number of strikes. The work believes that the simulated dataset will significantly improve the quality of handwriting text recognition models.

Keywords


Document processing; Generative adversarial networks; Handwritten text recognition; Strike through text; Synthetic dataset

Full Text:

PDF


DOI: https://doi.org/10.11591/eei.v15i3.10358

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Bulletin of EEI Stats

Bulletin of Electrical Engineering and Informatics (BEEI)
ISSN: 2089-3191, e-ISSN: 2302-9285
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).