SPARCLE: SPeaker-aware Aligned Representations via Contrastive Language Embeddings

arXiv:2607.01238v1 Announce Type: new Abstract: Recent advances in speech synthesis have shifted from phoneme representations to direct grapheme modeling. While phonemes address the one-to-many mapping between text and acoustics, they rely on grapheme-to-phoneme (G2P) systems that fail to capture speaker-specific acoustic variation. Prior work demonstrates that grapheme-based models outperform phoneme-based systems at scale, but not in low-resource settings. In this paper, we propose SPARCLE, a ...

arXiv cs.CL ·Priyam Mazumdar, Yurii Halychanskyi, Steven Guo, Mark Hasegawa-Johnson, Volodymyr Kindratenko ·
compartilhar: