Arabic vowels characterization and classification using the normalized energy in frequency bands

ABSTRACT


INTRODUCTION
Sounds are broadly classified into voiced and unvoiced speech.Vowels are spoken sounds that generate acoustically filtered, quasi-periodic air pulses as they flow through the vocal tract.Vowels and consonants differ primarily in that vowels resonate in the throat [1], [2].As well, the vowels are sounds produced without constriction in vocal tract unlike other types of sounds which are all produced by narrowing the flow of air through the vocal tract.However, the vowels are articulated by lifting a part of the tongue body, the location of the vowel refers to the part of the tongue that is highest in its production [3].
Vowel identification is crucial in the process of continuous speech recognition.Hence, efforts were developed in analyzing and characterizing vowels.Despite the fact that analyzing speech signals in the frequency domain is extremely important in studying its acoustic properties, formants remain the classical way of classifying vowels.Especially, the two first formants are the most significant acoustic parameters that determine vowels.Some researchers have exploited formant frequencies to develop algorithms for identification and classification of vowels in continuous speech [1]- [4].Other researchers have developed a recognition system based on the frequencies of the first and second formants (F1 and F2) [5]- [9].They reported that the formants of the long vowels are peripheral to those of the short ones.Natour et al. [10] have found that the front vowel /i/ has a high frequency for F2, while the back vowel /u/ has a very low F2.The F2 values for the central vowel /a/ are between these two extremes.Several additional cues that distinguish vowels have also been found in earlier studies.Alotaibi and Hussain [5] showed that the duration is very important to distinguish between short and long vowels.Others researchers have shown a relationship between speech rate and vowels duration.So, if the speech rate increases the vowel duration becomes short [11].More recently, researchers have indicated that vowel duration is a hugely important acoustic cue for vowel identification and speech Bulletin of Electr Eng & Inf ISSN: 2302-9285  Arabic vowels characterization and classification using the normalized energy in … (Mohamed Farchi) 269 intelligibility [12], [13].Moreover, Khattab and Al-Tamimi [14] have found no significant difference between males and females regarding to the durational results.Other investigators have developed a method based on the wavelet transform and spectral analysis for speech consonant and vowel segmentation in Arabic language without linguistic information [15].They have reported that there are a significant difference among the long and short vowels in both quantity and quality [16].
The literature review shows that the different vowel classification methods (/a/,/u/and/i/) used are effective.However, the distinction between short and long vowels remains difficult to implement.Indeed, researchers report that long vowels have a longer production time than short vowels [17], [18] and long vowel formants are peripheral to those of short ones [19].However, the variability of the speech signal makes the determination of the production time and the ranges of the formants of a short or long vowel very difficult.In our previous study [20], have found that both the formant frequencies and the normalized energy bands can differentiate between short Arabic vowels.Additionally, they have discovered that the spectral moments of long vowels (CoG and STD) show that their generation occurs in two stages: a transient phase at the start of vowel formation and a steady state phase as duration lengthens.Furthermore the difference between the short vowels and long ones is the fact that the equilibrium position is maintained longer in the production of a long vowel and the rate of change of the formants or the normalized energy (percent of energy) in the frequency bands can be a good indication for distinguishing between the long vowels and those short.According to [21] and [22], the spectral moments enable a more thorough definition of the vowel category.Korkko [23] used spectral moments to examine how young children produced the consonant /s/ in contexts with symmetrical vowels (such as /isi, usu, ysy,/s/).His research shows how vowel co-articulation has a considerable impact on its spectral properties.
The main goal of this research is to add to the body of knowledge regarding Arabic vowels in experimental literature.The conclusions reported here are based on an acoustic analysis of Arabic vowels and the creation of an algorithm for classifying long and short vowels.The main objective is to detect the Arabic vowels using the bands' normalized energy.
The structure of this paper is as follows.The procedures used, the equipment used, and the experiments conducted are described in the first section.The results are presented and discussed in the second section.A summary of the results and a discussion of the conclusions are presented in the final section.

METHOD 2.1. General processing
A set of experiments were conducted with the aim to describe how Arabic vowels behave.The methodology for these studies is outlined in this section, together with information on how the data was gathered and the tools that were employed.A corpus of Arabic language with short and long vowels (/a/, /a:/, /u/, /u:/, /i/, and /i:/) was created.Twenty Moroccan speakers between the ages of 20 and 40 were asked to repeat the isolated syllables CV, which contain both short and long vowels.Isolated syllables rather than words were selected to reduce the influence of other phonemes on the vowels under study.Additionally, the vowel's length can be freely increased.Since producing it involves little strain on the vocal tract, the consonant /ʔ/: ‫/ء/‬ was chosen for the entire corpus (Table 1).The speech was separated into 11.6 ms time segments with a 9.6 ms overlap and sampled at 22,050 Hz.After Hamming windowing and zero-padding each segment, a 512-point fast fourier transform was computed.

Computation of energy band
The magnitude spectrum was smoothed out per frame along the time index n using a 20-point moving average.Six distinct frequency bands (band 1: 0-400 Hz; band 2: 400-800 Hz; band 3: 800-1200 Hz; band 4: 1200-2000 Hz; band 5: 2000-3500 Hz and band 6: 3500-5000 Hz) were chosen from the smoothed spectrum X(n,k).The energy in each band was calculated as (1): where the band index b ranges from 1 to 6.The frequency index k ranges from the DFT indices representing the lowerand upper boundaries for each band.Then, the normalized energy band for each frame was determined by: where Ebn (n) denotes the normalized band energy b in the frame n, ET (n) is the frame's overall energy, and Eb (n) denotes the band energy b in the frame n.
The vowel is composed of three segments, the onset is the first segment, the closing segment is the coda and the nucleus is the central segment of the vowel.The energy in vowel nucleus was calculated as follow: Where Enucleus is the normalized band energy in vowel nucleus, dv is the vowel overall duration.

Band energy
The first purpose of this part is to study the energydistribution of vowels (/a/, /i/ and /u/) in the predefined frequencybands (B1, B2, B3, B4, B5, and B6) according to production duration.The Tables 2-4 summarize the obtained results.We can notice that all vowelshave a significant energy in the first band since they arevoiced sounds.Additionally, the percentage distribution of energybands is unaffected by the production duration.
We can also see that for /a/, more than 70% of the total energy of the vowel is concentrated in bands B1 and B2 in equal parts (≈35%).On the other hand, for /u/, more than 70% of its energy is located in band B1 and (≈20%) in band B2.For the vowel /i/, more than 70% of its energy is concentrated in the band B1 against (≈20%) in the band B5.The examination of the bands energy distribution of the vowels /a/, /i/, /u/ (see Figures 1-3) reveals two phases: − A transient phase which represents the beginning of vowel production and characterized by large changes in values of normalized energy bands.

−
A steady phase when increasing vowel production time, where normalized energy in the bands (B1, B2, B3 and B5) represents significant variations.These results are consistent with those of [20].

Algorithm
Based on the results obtained for the energy distribution in the predefined frequency bandsof vowels (/a/, /i/ and /u/) according to production duration, we have developedan algorithm which allows the classification of vowels.This algorithm consists of two parts: the first part is used to recognize the vowel /a, a:/, /i, i:/ or /u, u:/ and the second part is used to decide if this vowel is short or long.
Vowel recognition: by analyzing the results of the energy distribution in the six frequency bands [20], it can be seen that the normalized energy in the B1 band distinguishes the vowel /a, a:/.The normalized energies of the B2 and B5 bands make it possible to distinguish /u, u:/ from /i, i:/. Figure 4 shows the algorithm that makes this classification possible.To classify the long vowels from the short ones, we have calculated the average rate of change of normalized energy in the bands (B1, B2, B3, and B5) in the vowel nucleus.If this average is less than three dB, it is a long vowel otherwise it is a short vowel.To determine the vowel nucleus, the vowel is divided into three parts of the same length (each part represents 1/3 of the total length of the vowel): the first third is the beginning of the vowel (onset), the second third is the nucleus of the vowel (nucleus) and the last third is the end of the vowel (offset).The algorithm that distinguishes the short vowel from the long one is given in Figure 5.

Algorithm performance evaluation
Our algorithm was implemented in MATLAB and tested using the data from our corpus to determine its performance.The number of all short and long vowels in this experiment is 1200.1167 vowels are accurately classified according to the data, giving the classification process a total accuracy of 97.25%.Table 5 offers more thorough outcomes.As we can see, there were a relatively high number of errors made when identifying the letters "u" and "i".
For the distinction between the long and short vowel, we conducted classification tests on our corpus: 300 short vowels and 900 long vowels (400 records for each vowel: 100 short and 300 long).The classification results are summarized in the Tables 6-8.We can see that this algorithm allows a correct classification of 1136 These results are very competitive compared to those reported in the literature [17], [19], [24], [25].

CONCLUSION
The main contribution of this paper is the development of an algorithm that allow to recognize each Arabic vowel (/a/, /a:/, /u/, /u:/, /i/ and /i:/).Based on the fact that the energy distribution over time and frequency of each sound depends on the articulator used and the place and manner of production, we conducted an acoustic study based the energy percentage in the bands (band 1: 0-400 Hz; band 2: 400-800 Hz; band 3:800-1200 Hz; band 4: 1200-2000 Hz; band 5:2000-3500 Hz and band 6: 3500-5000 Hz).The results demonstrate that each vowel can be classified using the normalized energy in the frequency bands.The algorithms proposed in this work use these indices to recognize each vowel.The performance tests of these algorithms on our Arabic corpus show recognition rates of 92% for the vowels.
As perspectives of this research work, several axes can be explored.We can explore the deportation of those algorithms on platforms and test their robustness in a noisy environment.Searches can also be oriented towards the characterization of other phonemes.

Table 1 .
Arabic corpus of long and short vowels

Table 2 .
Distribution of the energy percentage in the bands according to the production time of the vowel /a/

Table 3 .
Distribution of the energy percentage in the bands according to the production time of the vowel /i/

Table 4 .
Distribution of the energy percentage in the bands according to the production time of the vowel /u/ 273vowels, hence a recognition rate of 94%.The overall recognition rate of the six short and long vowels is 92%.
Arabic vowels characterization and classification using the normalized energy in …(Mohamed Farchi)

Table 5 .
Confusion matrix of vowels

Table 7 .
Confusion matrix of short and long vowels of /u/

Table 8 .
Confusion matrix of short and long vowels of /i/