HybridCodec: Modeling Discrete and Continuous Representations for Efficient Speech Language Models

arXiv:2606.27627v1 Announce Type: new Abstract: Discrete audio representations have become increasingly popular for building multimodal text-audio systems and integrating audio capabilities into Large Language Models (LLMs). However, numerous studies report performance degradation on various downstream tasks due to information loss during discretization. To address this, we propose a novel approach combining temporally compressed discrete tokens with dimensionality-reduced continuous residuals. ...

arXiv cs.LG ·Artem Ploujnikov, Francesco Verdini, Samir Sadok, Mirco Ravanelli ·
compartilhar: