GlottDNN-based spectral tilt analysis of tense voice emotional styles for the expressive 3D numerical synthesis of vowel [a]

Marc Freixes, Marc Arnela, Francesc Alías, Joan Claudi Socoró

Publisher

ISCA

Conference

10th ISCA Speech Synthesis Workshop (SSW10)

Year

2019

Month

September

First page

132

Last page

136

City

Vienna

Country

Austria

Tipus Publicació

Proceedings

Research Group

HER

Research Line

Speech Processing

URL externa

https://www.researchgate.net/publication/335948452_GlottDNN-based_spectral_tilt_…

Three-dimensional (3D) acoustic models allow for an accurate modelling of acoustic wave propagation in 3D realistic vocal tracts. However, voice generated by these approaches is still limited in terms of expressiveness, which could be improved through proper modifications of the glottal source excitation. This work aims at adding some expressiveness to a 3D numerical synthesis approach based on the Finite Element Method (FEM) that uses as input an LF (Liljencrants-Fant) model controlled by the glottal shape parameter Rd . To that effect, a parallel Spanish speech corpus containing neutral and tense voice emotional styles is analysed with the GlottDNN vocoder, obtaining F0 and spectral tilt parameters associated with the glottal excitation. The variations of these two parameters are computed for happy and aggressive styles with reference to neutral speech, differentiating between stressed and unstressed vowels [a]. From this analysis, F0 and Rd values are then derived and used in the LF-FEM based synthesis of vowels [a] to resemble the aforementioned expressive styles. Results show that it is necessary to increase F0 and decrease Rd with respect to neutral speech, with larger deviations for happy than aggressive style, especially for the stressed vowels.

Authors