Mireia Diez Sánchez: Speaker diarization: automatically finding "who speaks when" in an audio conversation


Listen to another talk from the AICzechia series that was given on October 17, 2022
(the seminar is in English only)

Mireia Diez Sánchez

Speaker diarization: automatically finding „who speaks when“ in an audio conversation


Speaker diarization is the task of automatically determining the speaker turns in a recording of a conversation or, as commonly stated, finding „who spoke when“. Although seemingly easy for humans, speaker diarization remains a very challenging task in the automatic speech processing field. Speaker diarization deals not only with voice activity detection (VAD) and the complex speaker recognition stage but also faces the problem of having an unknown number of speakers in utterances, segmentation of speech into speaker turns (finding boundaries between speakers), and treatment of overlapped speech (cross-talk). In this talk, we will go through the evolution of diarization systems: we will first describe the early approaches, which considered a cascade of subtasks (e.g. VAD, finding homogeneous speaker regions and clustering, etc.). We will then focus on the neural network-based state-of-the-art methods, such as end-to-end diarization and target-speaker VAD systems, as well as the main current challenges.

Watch the seminar


Dr. Mireia Diez Sánchez is a researcher at the Speech@FIT group at Brno University of Technology. Mireia received her Electronic Engineering degree in 2009, and her Ph.D. in 2015, both from the University of the Basque Country, Spain.

Her thesis focused on the study of features for Language and Speaker recognition. In 2016 she obtained an individual Marie Curie fellowship for the SpeakerDICE project dealing with diarization tasks. She has attended several international workshops dedicated to the field of speaker recognition and diarization: Bosaris (Brno, 2012), ASRWIS (South Africa, 2016), and SCALE (Baltimore, 2017). Recently, she has successfully coordinated the BUT team for the DIHARD challenges. Her research interests are mainly speaker diarization, speaker and language recognition, and Bayesian inference.


More info on AICzechia Seminars

Previous articleSystems, Man, and Cybernetics – the IEEE SMC conference took place in Prague
Next articleLet´s play for "cookies"