SPARCLE: SPeaker-aware Aligned Representations via Contrastive Language Embeddings
arXiv:2607.01238v1 Announce Type: new Abstract: Recent advances in speech synthesis have shifted from phoneme representations to direct grapheme modeling. While phonemes address the one-to-many mapping between text and acoustics, they rely on grapheme-to-phoneme (G2P) systems that fail to capture speaker-specific acoustic variation. Prior work demonstrates that grapheme-based models outperform phoneme-based systems at scale, but not in low-resource settings. In this paper, we propose SPARCLE, a ...
arXiv cs.CL
·Priyam Mazumdar, Yurii Halychanskyi, Steven Guo, Mark Hasegawa-Johnson, Volodymyr Kindratenko
·
// relacionados
Leia também
Editorial
Canary-Qwen: a fórmula da NVIDIA que reescreveu o topo da transcrição de voz aberta
Blog
Benchmark de Compreensão de Documentos de Escritório
Blog
DRL-CLBA: A Clean Label Backdoor Attack for Speech Classification via DDPG Reinforcement Learning
Blog