€ 18,00


Speech representations

Adventures in pre-training and fine-tuning transformers for speech technology tasks

Nik Vaessen • Boek • paperback

  • Samenvatting
    Free download at https://doi.org/10.54195/9789465151090

    This thesis explores self-supervised speech representation learning, specifically focusing on the the wav2vec 2.0 framework.

    The research demonstrates that wav2vec 2.0, originally designed for automatic speech recognition, can be successfully fine-tuned for speaker recognition, even with limited labeled data. However, attempts to create a unified multi-task model for both speech and speaker recognition revealed performance trade-offs, as these tasks require orthogonal information processing.

    A comprehensive analysis of pre-training batch sizes shows that downstream performance primarily depends on the total amount of data observed during self-supervision. The thesis also addresses data quality requirements for self-supervised learning, finding that clean, prepared speech is essential - particularly avoiding vocal music content which causes training divergence.

    Finally, the research presents the creation of a 55,000-hour Dutch speech dataset from television broadcasts, demonstrating that mono-lingual pre-training can outperform multi-lingual pre-training for Dutch speech recognition.
  • Productinformatie
    Binding : Paperback
    Distributievorm : Boek (print, druk)
    Formaat : 170mm x 240mm
    Aantal pagina's : 142
    Uitgeverij : Radboud University Press
    ISBN : 9789465151090
    Datum publicatie : 09-2025
  • Inhoudsopgave
    niet beschikbaar
  • Reviews (0 uit 0 reviews)
    Wil je meer weten over hoe reviews worden verzameld? Lees onze uitleg hier.

Dissertations
published by

€ 18,00



3-4 werkdagen
Veilig betalen Logo
14 dagen bedenktermijn
Delen 
×
SERVICE
Contact
 
Vragen