Blog Áudio & Voz

Graph-Based Phonetic Error Correction of Noisy ASR

arXiv:2606.24889v1 Announce Type: new Abstract: Automatic speech recognition (ASR) systems, despite low overall word error rates, produce residual lexical errors that disproportionately affect semantically critical tokens such as named entities, negations, and sentiment-bearing words. These errors are often structured, arising from phonetic similarity rather than random noise, making naive token-level correction insufficient. We propose a structured ASR correction framework, that we call G-SPIN,...

arXiv cs.CL ·Pratik Rakesh Singh, Mohammadi Zaki, Aneesh Mukkamala, Pankaj Wasnik · 25 de janeiro de 2026

Ver no Hugging Face

// relacionados

Graph-Based Phonetic Error Correction of Noisy ASR

Leia também

LTX-2: o primeiro modelo fundacional de vídeo e áudio em conjunto — aberto, com 19B de parâmetros

How Reliable Is Your Jailbreak Judge? Calibration and Adversarial Robustness of Automated ASR Scoring

Noise-Aware Boundary-Enhanced Generative Learning for Ultrasound Speckle Reduction

Wan-Streamer v0.1: End-to-end Real-time Interactive Foundation Models