- Speech synthesis: We work in the framework of text-to-speech conversion, aiming to obtain highly realistic synthetic speech. Currently, we are working on improving its naturalness and expressivity, as well as on speech personalization according to different voice stereotypes. That is, we focus our research on speech modeling, synthesis and conversion techniques, dealing with the following languages: Catalan, Spanish and English.
- Speech recognition: We work on speech recognition in controlled communication environments, e.g. with restricted vocabularies.
- Audiovisual synchronization: A direct application of speech synthesis systems is their inclusion in virtual characters (or talking heads). In this framework, we investigate and develop synchronization methods for speech-driven expressive animation of the avatar facial elements (e.g. mouth, eyes and eyebrows) and lip co-articulation. To this end, we are developing an interface between the text-to-speech synthesis system and the virtual characters animation module.
- Speech corpora: We work on the design, recording and labeling of speech corpora. In this context, the availability of automatic speech signal segmentation and parameterization tools –fitted with visual interfaces that allow to check the marks created– is of paramount importance. Currently, we have several labeled speech corpora in different languages (Catalan, Spanish and English) and a corpus in Spanish with different speaking styles (happy, neutral, sensual, aggressive and sad).
- Emotion Recognition: We work on affect recognition from oral (i.e. speech) and visual features, which are extracted from facial images of the user, as well as from the text corresponding to the message. Currently, our main research efforts are oriented towards the derivation of novel parameterization techniques and the design of efficient emotion recognizers.
- Multimodal Clustering The growing ubiquity of multimedia calls for the development of efficient clustering systems capable of organizing multimodal data repositories in a fully unsupervised manner. Our work is focused on the construction of robust multimedia clustering systems based on self-refining consensus clustering, an approach that aims to obtain high quality data partitions through the consolidation of multiple clusterings, which allows to take advantage of both early and late fusion of modalities.