Double Degree in International Computer Engineering and Management of Business and Technology

Audio and Speech Processing

Description
Within several applications of digital signal processing, we find those that allow us to work on aspects of oral (by voice) and auditory interaction with machines. In this context we can find applications that allow you to process what is said (speech recognition) as well as what you listen (reduction or cancellation of noise, recognition of ambient sound, detection of sound events, music recognition, etc.), as well as others that allow to generate oral messages automatically (synthesis of speech) or also sound signals of diverse naturalness (synthesis of sound, synthesis of sung voice, sound effects, etc.). This range of applications is revolutionizing the world of interaction between people and machines, thanks to the revolution of digital systems and the increasing computing capacity of mobile devices. Within this universe of applications, in the subject of Digital Processing of Audio and Speech the bases are worked on that allow to automatically recognize sound events and environmental sound, and also techniques are studied to be able to generate and transform synthetic voice.
Type Subject
Optativa
Semester
First
Credits
4.00

Titular Professors

Previous Knowledge

Time and frequency characterization of analog signals and systems. Sampling Theorem. Discrete Time Fourier Transform. Z Transform. FIR and IIR filters.

Objectives

In the subject Speech and Audio Processing the following learning outcomes are provided:

- Know techniques of parametrization of the acoustic signal, as well as artificial learning techniques for the classification of sound events and environmental sound.

- Master the characteristics and main parameters for the analysis and synthesis of human speech.

More specifically, students of Speech and Audio Processing must achieve the following knowledge and skills:

1. Acquire knowledge in the use of support tools for the development of applications in the field of digital signal processing (MATLAB).

2. Acquire the foundations of digital signal processing that allow them to later assimilate the concepts related to the speech treatment, the analysis and recognition of environmental audio and sound events.

3. Acquire basic knowledge about the production and perception of speech that enable them to understand the techniques of voice signal analysis and the models used in the different applications related to Speech Technologies.

4. Acquire the basic knowledge about the analysis of the voice signal and its applications.

5. Identify, formulate and solve digital speech treatment problems in a multidisciplinary environment individually or as a member of a team.

6. Understand and apply parametrization methods of the acoustic signal for its subsequent treatment.

7. Acquire basic knowledge about artificial learning techniques applied to the detection and recognition of sound events and environmental sound.

Contents

AUDIO CONTENTS
1. Introduction to sound recognition
2. Parameterization of the audio signal
3. Artificial learning techniques
4. Sound recognition practice

SPEECH CONTENTS
1. Human speech systems
2. Speech analysis
3. Automatic speech recognition
4. Practice on speech analysis

Methodology

The teaching methodology used in the subject of Audio and Speech Processing is based on the combination of theoretical lectures together with practical activities that allow the student to deepen and exemplify the theoretical contents addressed in practical cases of application within the technologies of the treatment of audio and speech signal.

For each block of the subject you start doing some theory sessions, to continue with practical work sessions in groups of three people. Part of the practical work is done in the same classroom, in teaching hours, which is designed to facilitate teamwork using laptops with Internet connection, and where the teacher guides students in achieving the objectives set out in the job. This teamwork must be complemented with a dedication outside the teaching hours, both by the group to meet the challenges, and personally to assimilate the theoretical concepts.

Evaluation

The evaluative instruments used in the subject of Audio and Speech Processing are:

- Individual exams: for each block (audio and speech) the student must perform an exam on the theoretical contents of the module.
- Individual practical controls: for each practical activity of each module, the student must have an individual control that allows to reflect the degree of mastery of the practical exercise carried out
- Deliverables practical exercises: each group of students must deliver a deliverable for each module, which includes delivery of code as well as reports that describe and discuss the results obtained.

Evaluation Criteria

The final grade of the subject is calculated as an average of the grades of each module, and it is necessary that the final grade is approved with a 5 or more in order to overcome it:
NF = (50%) N_speech + (50%) N_audio

In addition, the grade of each module must be greater than or equal to 3.5, since otherwise the final grade is calculated as the lower of the two grades.

Each module is evaluated by means of the average of the theory note and the practice note:
N_x = (50%) N_theory + (50%) N_practice

The theory note is obtained from the individual theory exams. The practice grade is obtained from a weighting that takes into account the individual practical control (60%), the deliverables (30%) and a note of attitude and participation (10%).

Basic Bibliography

Joan Claudi Socoró, Ignasi Iriondo, “ Apuntes de Procesamiento digital de audio y habla”, Enginyeria La Salle, 2017.

Additional Material

THOMAS F. QUATIERI (2002) “Discrete-time speech signal processing. Principles and practice.”, Prentice-Hall, 2002.
MARK KAHRS, KARLHEINZ BRANDENBRUG (1998). “Applications of digital signal processing to audio and acoustics”, Kluwer Academic Publishers, 1998.

RICHARD O. DUDA, PETER E. HART, DAVID G. STORK (2012), “Pattern classification”, John Wiley & Sons, 2012.

TODD K. MOON, WYNN C. STIRLING (2000), “Mathematical methods and algorithms for signal processing”, Prentice-Hall, 2000.

FRANCESC ALÍAS, JOAN CLAUDI SOCORÓ, XAVIER SEVILLANO (2016) "A Review of Physical and Perceptual Feature Extraction Techniques for Speech, Music and Environmental Sounds", Applied Sciences (Special issue on Audio Signal Processing), 6(5):143; doi:10.3390/app6050143 (MPDI - Open Access Publishing), May 2016.

J.R. DELLER, J. G. PROAKIS, J.H. L. HANSEN. “Discrete-Time Processing of Speech Signals”. Macmillan Publishing Company, 1993.

L. RABINER, L. Y. JUANG. “Fundamentals of speech recognition”. Prentice Hall, 1993.

X. HUANG, A. ACERO, H.W. HON. “Spoken Language Processing: A Guide to Theory, Algorithm and System Development”, Prentice Hall, 2001

J. Huopaniemi, Virtual acoustics and 3-D sound in multimedia signal processing, tesi doctoral disponible a http://www.huopaniemi.net/jyri/pubs.html