Rethinking Speech-LLM Integration for ASR: Effective Joint Speech-Text Training by Interleaving
arXiv:2607.01733v1 Announce Type: new Abstract: Speech-LLM integration has shown promising results by leveraging extensive textual pretraining, yet its specific benefits for automatic speech recognition (ASR) remain unclear. We observe that as supervised ASR training data increases, the contribution of LLM priors becomes less evident, and simple speech-text joint training under-utilizes textual knowledge. We therefore propose Joint Speech-Text Interleaved Pretraining (JSTIP), an ASR-oriented pre...
arXiv cs.CL
·Ruchao Fan, Yiming Wang, Rui Zhao, Liliang Ren, Keqi Deng, Xiaoyang Chen, Ali Zare, Bo Ren, Yuxuan Hu, Junkun Chen, Yan Huang, Yelong Shen, Jinyu Li
·
// relacionados
Leia também
Editorial
Canary-Qwen: a fórmula da NVIDIA que reescreveu o topo da transcrição de voz aberta
Blog
Benchmark de Compreensão de Documentos de Escritório
Blog
DRL-CLBA: A Clean Label Backdoor Attack for Speech Classification via DDPG Reinforcement Learning
Blog