Blog Áudio & Voz Dados & Embeddings

SPARCLE: SPeaker-aware Aligned Representations via Contrastive Language Embeddings

arXiv:2607.01238v1 Announce Type: new Abstract: Recent advances in speech synthesis have shifted from phoneme representations to direct grapheme modeling. While phonemes address the one-to-many mapping between text and acoustics, they rely on grapheme-to-phoneme (G2P) systems that fail to capture speaker-specific acoustic variation. Prior work demonstrates that grapheme-based models outperform phoneme-based systems at scale, but not in low-resource settings. In this paper, we propose SPARCLE, a ...

arXiv cs.CL ·Priyam Mazumdar, Yurii Halychanskyi, Steven Guo, Mark Hasegawa-Johnson, Volodymyr Kindratenko · 03 de janeiro de 2026

Ver no Hugging Face

// relacionados

SPARCLE: SPeaker-aware Aligned Representations via Contrastive Language Embeddings

Leia também

Canary-Qwen: a fórmula da NVIDIA que reescreveu o topo da transcrição de voz aberta

Benchmark de Compreensão de Documentos de Escritório

DRL-CLBA: A Clean Label Backdoor Attack for Speech Classification via DDPG Reinforcement Learning

From Monolingual to Multilingual: Evaluating Mamba for ASR in South African Languages