Blog Áudio & Voz LLMs & Texto

Data Scale, Not Latency, Shapes Cross-Lingual Encoder Transfer in Streaming ASR

arXiv:2606.24169v1 Announce Type: new Abstract: Adapting a streaming speech recognition model to a new language requires choosing between two plausible warm starts: a multilingual (ML) encoder or an English-only (EN) encoder. The common intuition is that the multilingual encoder should help most at low data, but it is unclear how long that advantage persists, whether tight streaming latency amplifies it, and whether it survives deployment quantization. We answer these questions with a controlled...

arXiv cs.AI ·Nenad Banfic · 24 de janeiro de 2026

Ver no Hugging Face

// relacionados

Data Scale, Not Latency, Shapes Cross-Lingual Encoder Transfer in Streaming ASR

Leia também

bosonai/higgs-tts-v3-4b

Gradium Launches stt-translate and s2s-translate, Real-Time Speech Translation Models Beating gpt-realtime-translate on Accuracy and Latency

Layer-wise Probing of wav2vec 2.0 and Whisper for Consonant Cluster Reduction in African American English

Introducing the FFASR Leaderboard: Benchmarking ASR in the Real World