Data Scale, Not Latency, Shapes Cross-Lingual Encoder Transfer in Streaming ASR
arXiv:2606.24169v1 Announce Type: new Abstract: Adapting a streaming speech recognition model to a new language requires choosing between two plausible warm starts: a multilingual (ML) encoder or an English-only (EN) encoder. The common intuition is that the multilingual encoder should help most at low data, but it is unclear how long that advantage persists, whether tight streaming latency amplifies it, and whether it survives deployment quantization. We answer these questions with a controlled...
arXiv cs.AI
·Nenad Banfic
·
// relacionados
Leia também
Modelo
bosonai/higgs-tts-v3-4b
Blog
Gradium Launches stt-translate and s2s-translate, Real-Time Speech Translation Models Beating gpt-realtime-translate on Accuracy and Latency
Blog
Layer-wise Probing of wav2vec 2.0 and Whisper for Consonant Cluster Reduction in African American English
Blog