Neural Machine Translation for Low-Resource Tangkhul--English

arXiv:2606.25365v1 Announce Type: new Abstract: We present a study on low-resource machine translation for the Tangkhul-English (nmf-en) language pair. Tangkhul is a severely under-resourced Tibeto-Burman language spoken primarily in Manipur, India, with virtually no prior natural language processing infrastructure. We describe two systems: (1) a primary system based on ByT5-large fine-tuned on 38,336 Tangkhul-English parallel sentence pairs, and (2) a contrastive system based on mT5-small fine-...

arXiv cs.CL ·Chormi Zimik Vashai, Agniva Maiti ·
compartilhar: