FEMVoQ
Research Lines
Numerical methods have deeply impacted research in the areas of biomechanics and biomedical engineering. From the intricate behavior of the human heart to the respiratory system, or to the biomechanics of our skeleton, computers have allowed us to perform realistic three dimensional (3D) simulations to better understand the behavior of the human body. The voice organ is not an exception. However, its underlying physics are very complex and current high-performance computers are not capable to reproduce human voice as a whole, i.e., from phonation to the emitted sound. Consequently numerical efforts have either been placed onto phonation (i.e., simulating the self-oscillations of the vocal cords) or onto vocal tract (VT) acoustics (i.e., simulating the propagation of acoustic waves in time evolving VT geometries). The project FEMVoQ totally focuses on 3D VT acoustics and on the generation of audible sound following a physics-based approach. The underlyingacoustic equations will be solved using different finite element method (FEM) strategies.
The goal of the FEMVoQ project is twofold. The first one is to significantly increase the number and types of 3D generated spoken utterances that have been reported to date in literature. Like a baby child that starts pronouncing some brief utterances like /aga/, /eko/, /apu/ and lately /asa/, which involve velar and bilabial stop consonants, as well as fricatives, we aim at producing those sounds and surpassing the current state of the art of 3D VT acoustics, which mostly focuses on the easier generation of vowels and diphthongs. And what is most important, we plan to develop 3D numerical strategies and approximations to generate such new utterances without resorting to supercomputer facilities.
The second objective of the FEMVoQ is also very ambitious. Not only we want to advance in the generation of new utterances but we also wish to endorse them with some voice qualities (VoQ). Voice qualities arise from variations in the phonation type and in the VT shape. There are many examples of them, from the well-known Lombard effect when one speaks in a noisy environment, to the singing formant that allows for a better voice projection, or to speaking in sad or aggressive styles, to mention a few. VoQ will be introduced in 3D VT acoustics as follows. First, we will develop a tuning optimization strategy to modify the shape of 3D VTs and move its formants to target values. Also, we will implement phonation models to be imposed as boundary conditions in the 3D VTs geometries. The parameters of these models will be tuned depending on the VoQ to be reproduced. This will be only the beginning, though. Our real purpose here is to establish a first link between 3D VT acoustics and speech analysis techniques for synthesis purposes. To that aim, we will analyze and label two different speech corpus containing vowels and short utterances with different vocal efforts. By means of inverse-filtering techniques, we will decompose the signal into the influence of the glottal source and of the VT and determine parameter settings to be included in 3D numerical simulations of voice. This will allow us to compare the simulations against recorded data for different VoQs and validate them with objective and perceptual tests.
The outputs of FEMVoQ will have potential future applications in a large variety of areas that range from health (phoniatrics, phonosurgery and speech therapy) to the videogame industry.
Project funded by Ministerio de Ciencia, Innovación y Universidades of the Spanish Government - Agencia Estatal de Investigación by grant nº PID2020-120441GB-I00 (PID2020-120441GB-I00 / AEI / 10.13039/501100011033).