Generating synthetic strike-through handwritten text using generative adversarial networks
Dheemanth Urs Raje, Chethan Hasigala Krishanappa
Abstract
The evaluation of handwritten text documents is a significant research field in text analysis. Deep learning classifiers' effectiveness tends to decline when faced with variations in text style and the presence of strike-out text components. These strike-outs can occur at the character, word, paragraph, or page level. When these documents undergo optical character recognition (OCR) processing, they often yield inaccurate results. Data unavailability, imbalance poses a strong risk in this area. Hence usage of synthetic datasets for conducting research can solve the issue to an extent. Despite using traditional data augmentation techniques like rotation, position shifting, zooming, and shearing, the issue of imbalanced class distribution remained unchanged. In order to tackle this challenge this paper demonstrates the creation of a synthetic dataset comprising of strikethrough text using modified version of generative adversarial networks (GANs) that has an auxiliary network for text recognition. IAM and RIMES dataset are used as base for the GAN to generate synthetic images of handwritten text. A line segmentation technique is also implemented to create strike outs on random words selected from the synthetic image set thereby increasing the number of strikes. The work believes that the simulated dataset will significantly improve the quality of handwriting text recognition models.
Keywords
Document processing; Generative adversarial networks; Handwritten text recognition; Strike through text; Synthetic dataset
DOI:
https://doi.org/10.11591/eei.v15i3.10358
Refbacks
There are currently no refbacks.
This work is licensed under a
Creative Commons Attribution-ShareAlike 4.0 International License .
<div class="statcounter"><a title="hit counter" href="http://statcounter.com/free-hit-counter/" target="_blank"><img class="statcounter" src="http://c.statcounter.com/10241695/0/5a758c6a/0/" alt="hit counter"></a></div>
Bulletin of EEI Stats
Bulletin of Electrical Engineering and Informatics (BEEI) ISSN: 2089-3191 , e-ISSN: 2302-9285 This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU) .