The use of generative adversarial network as a domain adaptation method for cross-corpus speech emotion recognition
Muhammad Farhan Fadhil, Amalia Zahra
Abstract
The research of speech emotion recognition (SER) is growing rapidly. However, SER still faces a cross-corpus SER problem which is performance degradation when a single SER model is tested in different domains. This study shows the impact of implementing a generative adversarial network (GAN) model for adapting speech data from different domains and performs emotion classification from the speech features using a 1D convolutional neural network (CNN) model. The results of this study found that the domain adaptation approach using a GAN model could improve the accuracy of emotion classification in speech data from 2 different domain such as the ryerson audio-visual database of emotional speech and song (RAVDESS) speech corpus and the EMO-DB speech corpus ranging from 10.88% to 28.77%, with the highest average performance increase across three different class balancing method reaching 18.433%.
Keywords
Cross-corpus SER; Domain adaptation; Generative adversarial networks; SER performance degradation; Speech emotion recognition
DOI:
https://doi.org/10.11591/eei.v14i1.8339
Refbacks
There are currently no refbacks.
This work is licensed under a
Creative Commons Attribution-ShareAlike 4.0 International License .
<div class="statcounter"><a title="hit counter" href="http://statcounter.com/free-hit-counter/" target="_blank"><img class="statcounter" src="http://c.statcounter.com/10241695/0/5a758c6a/0/" alt="hit counter"></a></div>
Bulletin of EEI Stats
Bulletin of Electrical Engineering and Informatics (BEEI) ISSN: 2089-3191, e-ISSN: 2302-9285 This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU) .