Interleaved Speech Language Models Latently Work In Text
Interleaved speech-text language models exhibit an implicit transcription phase where text tokens become decodable in intermediate layers, followed by text-based prediction before…
Hugging Face · Daily Papers
·Talia Sternberg, Gallil Maimon
·
·▲ 10 upvotes
Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.
Autores: Talia Sternberg, Gallil Maimon, Yossi Adi
- 10 upvotes da comunidade
- Temas: speech language models, speech-text interleaving, logit lens, intermediate layers, text token, speech recognition
Resumo
Resumo original (em inglês), extraído do paper:
Interleaved speech-text language models exhibit an implicit transcription phase where text tokens become decodable in intermediate layers, followed by text-based prediction before speech domain transformation.Onde ler
// relacionados
Leia também
Editorial
Um modelo, muitas latências: limpar a voz sem escolher entre rápido e bom
Blog
KM-Speaker: Keypoint-Based Style Control for High-Quality Speech-Driven 3D Facial Animation and Dialogue Localization
Blog
Majority Vote Silences Minority Values: Annotator Disagreement at the Hate/Offensive Boundary in HateXplain
Blog