Structure-Preserving Document Translation via Multi-Stage LLM Pipeline: A Case Study in Marathi

arXiv:2606.28796v1 Announce Type: new Abstract: Government documents in India are predominantly issued in regional languages such as Marathi, creating substantial accessibility barriers for non-native readers, interstate administrative bodies, and policy analysts. Although recent advances in neural machine translation have improved sentence-level translation quality, existing systems largely neglect document structure, formatting integrity, and domain-specific terminology, thereby limiting their...

arXiv cs.CL ·Manasi Waghe, Danish Chandargi, Mohammad Aamir Rayyan, Raviraj Joshi, A. R. Deshpande ·
compartilhar: