Papers
arxiv:2409.06372

SpeechTaxi: On Multilingual Semantic Speech Classification

Published on Sep 10, 2024
Authors:
,

Abstract

Recent advancements in multilingual speech encoding as well as transcription raise the question of the most effective approach to semantic speech classification. Concretely, can (1) end-to-end (E2E) classifiers obtained by fine-tuning state-of-the-art multilingual speech encoders (MSEs) match or surpass the performance of (2) cascading (CA), where speech is first transcribed into text and classification is delegated to a text-based classifier. To answer this, we first construct SpeechTaxi, an 80-hour multilingual dataset for semantic speech classification of Bible verses, covering 28 diverse languages. We then leverage SpeechTaxi to conduct a wide range of experiments comparing E2E and CA in monolingual semantic speech classification as well as in cross-lingual transfer. We find that E2E based on MSEs outperforms CA in monolingual setups, i.e., when trained on in-language data. However, MSEs seem to have poor cross-lingual transfer abilities, with E2E substantially lagging CA both in (1) zero-shot transfer to languages unseen in training and (2) multilingual training, i.e., joint training on multiple languages. Finally, we devise a novel CA approach based on transcription to Romanized text as a language-agnostic intermediate representation and show that it represents a robust solution for languages without native ASR support. Our SpeechTaxi dataset is publicly available at: https://huggingface.co/ datasets/LennartKeller/SpeechTaxi/.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2409.06372 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2409.06372 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.