Blog Áudio & Voz LLMs & Texto

Rethinking Speech-LLM Integration for ASR: Effective Joint Speech-Text Training by Interleaving

arXiv:2607.01733v1 Announce Type: new Abstract: Speech-LLM integration has shown promising results by leveraging extensive textual pretraining, yet its specific benefits for automatic speech recognition (ASR) remain unclear. We observe that as supervised ASR training data increases, the contribution of LLM priors becomes less evident, and simple speech-text joint training under-utilizes textual knowledge. We therefore propose Joint Speech-Text Interleaved Pretraining (JSTIP), an ASR-oriented pre...

arXiv cs.CL ·Ruchao Fan, Yiming Wang, Rui Zhao, Liliang Ren, Keqi Deng, Xiaoyang Chen, Ali Zare, Bo Ren, Yuxuan Hu, Junkun Chen, Yan Huang, Yelong Shen, Jinyu Li · 03 de janeiro de 2026

Ver no Hugging Face

// relacionados

Rethinking Speech-LLM Integration for ASR: Effective Joint Speech-Text Training by Interleaving

Leia também

Canary-Qwen: a fórmula da NVIDIA que reescreveu o topo da transcrição de voz aberta

Benchmark de Compreensão de Documentos de Escritório

DRL-CLBA: A Clean Label Backdoor Attack for Speech Classification via DDPG Reinforcement Learning

From Monolingual to Multilingual: Evaluating Mamba for ASR in South African Languages