The first FOSD-tacotron-2-based text-to-speech application for Vietnamese
Duc Chung Tran
Abstract
Recently, with the development and deployment of voicebots which help to minimize personnels at call centers, text-to-speech (TTS) systems supporting English and Chinese have attracted attentions of researchers and corporates worldwide. However, there is very limited published works in TTS developed for Vietnamese. Thus, this paper presents in detail the first Tacotron-2-based TTS application development for Vietnamese that utilizes the publicly available FPT open speech dataset (FOSD) containing approximately 30 hours of labeled audio files together with their transcripts. The dataset was made available by FPT Corporation with an open access license. A new cleaner was developed for supporting Vietnamese language rather than English which was provided by default in Mozilla TTS source code. After 225,000 training steps, the generated speeches have mean opinion score (MOS) well above the average value of 2.50 and center around 3.00 for both clearness and naturalness in a crowd-source survey.
Keywords
Application; Bot; Tacotron-2; Text-to-speech; Vietnamese
DOI:
https://doi.org/10.11591/eei.v10i2.2539
Refbacks
There are currently no refbacks.
This work is licensed under a
Creative Commons Attribution-ShareAlike 4.0 International License .
<div class="statcounter"><a title="hit counter" href="http://statcounter.com/free-hit-counter/" target="_blank"><img class="statcounter" src="http://c.statcounter.com/10241695/0/5a758c6a/0/" alt="hit counter"></a></div>
Bulletin of EEI Stats
Bulletin of Electrical Engineering and Informatics (BEEI) ISSN: 2089-3191, e-ISSN: 2302-9285 This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU) .