Multimodal speech emotion recognition optimization using genetic algorithm
Stefanus Michael, Amalia Zahra
Abstract
Speech emotion recognition (SER) is a technology that can detect emotions in speech. Various methods have been used in developing SER, such as convolutional neural networks (CNNs), long short-term memory (LSTM), and multilayer perceptron. However, sometimes in addition to model selection, other techniques are still needed to improve SER performance, namely optimization methods. This paper compares manual hyperparameter tuning using grid search (GS) and hyperparameter tuning using genetic algorithm (GA) on the LSTM model to prove the performance increase in the multimodal SER model after optimization. The accuracy, precision, recall, and F1 score improvement obtained by hyperparameter tuning using GA (HTGA) is 2.83%, 0.02, 0.05, and 0.04, respectively. Thus, HTGA obtains better results than the baseline hyperparameter tuning method using a GS.
Keywords
A lite bidirectional encoder representation from transformers; Genetic algorithm; Interactive emotional dyadic motion capture; Long short-term memory; Speech emotion recognition
DOI:
https://doi.org/10.11591/eei.v13i5.7409
Refbacks
There are currently no refbacks.
This work is licensed under a
Creative Commons Attribution-ShareAlike 4.0 International License .
<div class="statcounter"><a title="hit counter" href="http://statcounter.com/free-hit-counter/" target="_blank"><img class="statcounter" src="http://c.statcounter.com/10241695/0/5a758c6a/0/" alt="hit counter"></a></div>
Bulletin of EEI Stats
Bulletin of Electrical Engineering and Informatics (BEEI) ISSN: 2089-3191, e-ISSN: 2302-9285 This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU) .