Titular Professors
Time and frequency characterization of analog signals and systems. Sampling Theorem. Discrete Time Fourier Transform. Z Transform. FIR and IIR filters.
In the subject Speech and Audio Processing the following learning outcomes are provided:
- Know techniques of parametrization of the acoustic signal, as well as artificial learning techniques for the classification of sound events and environmental sound.
- Master the characteristics and main parameters for the analysis and synthesis of human speech.
More specifically, students of Speech and Audio Processing must achieve the following knowledge and skills:
1. Acquire knowledge in the use of support tools for the development of applications in the field of digital signal processing (MATLAB).
2. Acquire the foundations of digital signal processing that allow them to later assimilate the concepts related to the speech treatment, the analysis and recognition of environmental audio and sound events.
3. Acquire basic knowledge about the production and perception of speech that enable them to understand the techniques of voice signal analysis and the models used in the different applications related to Speech Technologies.
4. Acquire the basic knowledge about the analysis of the voice signal and its applications.
5. Identify, formulate and solve digital speech treatment problems in a multidisciplinary environment individually or as a member of a team.
6. Understand and apply parametrization methods of the acoustic signal for its subsequent treatment.
7. Acquire basic knowledge about artificial learning techniques applied to the detection and recognition of sound events and environmental sound.
AUDIO CONTENTS
1. Introduction to sound recognition
2. Parameterization of the audio signal
3. Artificial learning techniques
4. Sound recognition practice
SPEECH CONTENTS
1. Human speech systems
2. Speech analysis
3. Automatic speech recognition
4. Practice on speech analysis
The teaching methodology used in the subject of Audio and Speech Processing is based on the combination of theoretical lectures together with practical activities that allow the student to deepen and exemplify the theoretical contents addressed in practical cases of application within the technologies of the treatment of audio and speech signal.
For each block of the subject you start doing some theory sessions, to continue with practical work sessions in groups of three people. Part of the practical work is done in the same classroom, in teaching hours, which is designed to facilitate teamwork using laptops with Internet connection, and where the teacher guides students in achieving the objectives set out in the job. This teamwork must be complemented with a dedication outside the teaching hours, both by the group to meet the challenges, and personally to assimilate the theoretical concepts.
The evaluative instruments used in the subject of Audio and Speech Processing are:
- Individual exams: for each block (audio and speech) the student must perform an exam on the theoretical contents of the module.
- Individual practical controls: for each practical activity of each module, the student must have an individual control that allows to reflect the degree of mastery of the practical exercise carried out
- Deliverables practical exercises: each group of students must deliver a deliverable for each module, which includes delivery of code as well as reports that describe and discuss the results obtained.
The final grade of the subject is calculated as an average of the grades of each module, and it is necessary that the final grade is approved with a 5 or more in order to overcome it:
NF = (50%) N_speech + (50%) N_audio
In addition, the grade of each module must be greater than or equal to 3.5, since otherwise the final grade is calculated as the lower of the two grades.
Each module is evaluated by means of the average of the theory note and the practice note:
N_x = (50%) N_theory + (50%) N_practice
The theory note is obtained from the individual theory exams. The practice grade is obtained from a weighting that takes into account the individual practical control (60%), the deliverables (30%) and a note of attitude and participation (10%).
Joan Claudi Socoró, Ignasi Iriondo, Apuntes de Procesamiento digital de audio y habla, Enginyeria La Salle, 2017.
THOMAS F. QUATIERI (2002) Discrete-time speech signal processing. Principles and practice., Prentice-Hall, 2002.
MARK KAHRS, KARLHEINZ BRANDENBRUG (1998). Applications of digital signal processing to audio and acoustics, Kluwer Academic Publishers, 1998.
RICHARD O. DUDA, PETER E. HART, DAVID G. STORK (2012), Pattern classification, John Wiley & Sons, 2012.
TODD K. MOON, WYNN C. STIRLING (2000), Mathematical methods and algorithms for signal processing, Prentice-Hall, 2000.
FRANCESC ALÍAS, JOAN CLAUDI SOCORÓ, XAVIER SEVILLANO (2016) "A Review of Physical and Perceptual Feature Extraction Techniques for Speech, Music and Environmental Sounds", Applied Sciences (Special issue on Audio Signal Processing), 6(5):143; doi:10.3390/app6050143 (MPDI - Open Access Publishing), May 2016.
J.R. DELLER, J. G. PROAKIS, J.H. L. HANSEN. Discrete-Time Processing of Speech Signals. Macmillan Publishing Company, 1993.
L. RABINER, L. Y. JUANG. Fundamentals of speech recognition. Prentice Hall, 1993.
X. HUANG, A. ACERO, H.W. HON. Spoken Language Processing: A Guide to Theory, Algorithm and System Development, Prentice Hall, 2001
J. Huopaniemi, Virtual acoustics and 3-D sound in multimedia signal processing, tesi doctoral disponible a http://www.huopaniemi.net/jyri/pubs.html