Regional Language to Bangla Joypuhat, Rajshahi, Bangladesh


Authors : Md. Abu Horaira Sarder

Volume/Issue : Volume 10 - 2025, Issue 12 - December


Google Scholar : https://tinyurl.com/mrj7ypb9

Scribd : https://tinyurl.com/3wdczmts

DOI : https://doi.org/10.38124/ijisrt/25dec1243

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.


Abstract : Bangladesh and its adjoining regions exhibit extensive linguistic diversity, comprising numerous regional languages and dialects that remain underrepresented in digital communication systems. The absence of standardized translation frameworks for these regional varieties poses substantial barriers to information accessibility, knowledge dissemination, and inclusive technological development. This study proposes an NLP-based computational model for systematically translating regional languages into Standard Bangla, thereby addressing the linguistic gap between informal spoken varieties and formal written Bangla. The research methodology encompasses corpus development, data annotation, text normalization, tokenization, phonological mapping, and the application of machine-learning and sequence-to-sequence translation architectures. A parallel dataset consisting of region-specific lexical items, syntactic structures, and semantic patterns was constructed to train and evaluate the system. Experimental evaluation indicates that the proposed model achieves promising translation accuracy while preserving semantic integrity and contextual meaning. The findings highlight the system's potential to support language standardization, promote linguistic inclusivity, and facilitate broader digital participation among speakers of marginalized dialects. The study further advances localized NLP research in Bangladesh and provides a foundation for future extensions to educational technology, governmental communication platforms, and multilingual AI systems.

References :

  1. I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” Advances in Neural Information Processing Systems (NeurIPS), pp. 3104–3112, 2014.
  2. D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” in Proc. Int. Conf. on Learning Representations (ICLR), 2015.
  3. M. Luong, H. Pham, and C. D. Manning, “Effective approaches to attention-based neural machine translation,” in Proc. Conf. on Empirical Methods in Natural Language Processing (EMNLP), pp. 1412–1421, 2015.
  4. A. Vaswani et al., “Attention is all you need,” in Proc. Advances in Neural Information Processing Systems (NeurIPS), pp. 5998–6008, 2017.
  5. I. Goodfellow et al., “Generative adversarial nets,” in Proc. Advances in Neural Information Processing Systems (NeurIPS), pp. 2672–2680, 2014.
  6. G. Lample, A. Conneau, L. Denoyer, and M. Ranzato, “Unsupervised machine translation using monolingual corpora only,” in Proc. Int. Conf. on Learning Representations (ICLR), 2018.
  7. P. Koehn, Statistical Machine Translation. Cambridge, U.K.: Cambridge Univ. Press, 2010.
  8. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proc. NAACL-HLT, pp. 4171–4186, 2019.
  9. A. Conneau et al., “Unsupervised cross-lingual representation learning at scale,” in Proc. Annual Meeting of the Association for Computational Linguistics (ACL), pp. 8440–8451, 2020.
  10. M. Lewis et al., “BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” in Proc. ACL, pp. 7871–7880, 2020.
  11. K. Papineni, S. Roukos, T. Ward, and W. Zhu, “BLEU: A method for automatic evaluation of machine translation,” in Proc. ACL, pp. 311–318, 2002.
  12. C.-Y. Lin, “ROUGE: A package for automatic evaluation of summaries,” in Proc. ACL Workshop on Text Summarization Branches Out, pp. 74–81, 2004.
  13. S. Banerjee and A. Lavie, “METEOR: An automatic metric for MT evaluation with improved correlation with human judgments,” in Proc. ACL Workshop, pp. 65–72, 2005.
  14. C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval. Cambridge, U.K.: Cambridge Univ. Press, 2008.
  1. M. Hasan and M. S. Islam, “Bangla language processing: A survey,” Journal of Information and Communication Technology, vol. 19, no. 2, pp. 123–145, 2020.
  2. S. A. Chowdhury and M. J. Alam, “Neural machine translation for Bangla–English using transformer architecture,” Int. J. Comput. Appl., vol. 176, no. 10, pp. 1–7, 2019.
  3. A. Bhattacharjee, M. S. Rahman, and S. Sarker, “BanglaBERT: Language model pretraining for Bangla language processing,” arXiv preprint arXiv:2101.00204, 2021.
  4. M. J. Islam, M. T. Taher, and S. Paul, “Vashantor: A multilingual benchmark dataset for Bangla regional dialect translation,” arXiv preprint arXiv:2303.XXXXX, 2023.
  5. A. Rahman and M. S. Islam, “Computational challenges in Bangla dialect processing,” Dhaka Univ. J. Linguistics, vol. 15, no. 1, pp. 45–60, 2022.
  6. A. H. Author, “Regional Language to Bangla: Joypurhat dialect dataset,” Self-compiled dataset, Rajshahi Division, Bangladesh, 2025.

Bangladesh and its adjoining regions exhibit extensive linguistic diversity, comprising numerous regional languages and dialects that remain underrepresented in digital communication systems. The absence of standardized translation frameworks for these regional varieties poses substantial barriers to information accessibility, knowledge dissemination, and inclusive technological development. This study proposes an NLP-based computational model for systematically translating regional languages into Standard Bangla, thereby addressing the linguistic gap between informal spoken varieties and formal written Bangla. The research methodology encompasses corpus development, data annotation, text normalization, tokenization, phonological mapping, and the application of machine-learning and sequence-to-sequence translation architectures. A parallel dataset consisting of region-specific lexical items, syntactic structures, and semantic patterns was constructed to train and evaluate the system. Experimental evaluation indicates that the proposed model achieves promising translation accuracy while preserving semantic integrity and contextual meaning. The findings highlight the system's potential to support language standardization, promote linguistic inclusivity, and facilitate broader digital participation among speakers of marginalized dialects. The study further advances localized NLP research in Bangladesh and provides a foundation for future extensions to educational technology, governmental communication platforms, and multilingual AI systems.

CALL FOR PAPERS


Paper Submission Last Date
31 - January - 2026

Video Explanation for Published paper

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe