Evaluating hunspell symspell norvig and ngram spellcheckers for azerbaijani text| International Journal of Innovative Science and Research Technology

Evaluating Hunspell, SymSpell, Norvig, and N-gram Spellcheckers for Azerbaijani Text

Authors : Israfil Gasim

Volume/Issue : Volume 10 - 2025, Issue 5 - May

Google Scholar : https://tinyurl.com/54dvbkfz

DOI : https://doi.org/10.38124/ijisrt/25may1640

PlumX Metrics

Semantic Scholar

ResearchGate

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.

Abstract : Automatic spelling correction is critical for enhancing text quality and usability across digital platforms, particularly for morphologically rich and low-resource languages like Azerbaijani. This paper presents a comparative analysis and benchmarking of four prominent spellchecking algorithms—Hunspell, SymSpell, Norvig's probabilistic model, and N-gram statistical models—implemented specifically for Azerbaijani. A comprehensive evaluation was conducted using a manually annotated corpus comprising diverse Azerbaijani text sources, simulating common orthographic errors typical in everyday language usage. Results indicate moderate effectiveness among all tested methods, with Hunspell achieving the highest accuracy (84.5%) due to its robust dictionary-based morphological handling. Despite its speed advantage, SymSpell (81.4% accuracy) requires extensive dictionary resources, making it impractical for morphologically complex languages without significant resource investments. Norvig's method (78.3%) and the N-gram model (82.1%) also demonstrated limitations related to corpus dependency and computational efficiency, respectively. The findings highlight substantial challenges posed by Azerbaijani’s agglutinative structure, underscoring the inadequacy of existing general-purpose algorithms. Consequently, the paper emphasizes the urgent need for new hybrid approaches specifically tailored to Azerbaijani and similarly structured languages, suggesting directions for future research and development in spelling correction technologies.

Keywords : Azerbaijani Language, Spellchecking Algorithms, Hunspell, Symspell. Norvig, N-gram, Agglutinative Languages, NLP

References :

Microsoft Research, "Speller100: Zero-shot spelling correction at scale for 100-plus languages," unpublished, 2021. [Online]. Available: https://www.microsoft.com/en-us/research/blog/speller100-zero-shot-spelling-correction-at-scale-for-100-plus-languages. [Accessed May 10, 2024].
Markus Näther. 2020. An In-Depth Comparison of 14 Spelling Correction Tools on a Common Benchmark. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 1849–1857, Marseille, France. European Language Resources Association
Had, I. S. ., Maulana Baihaqi, W., & Putriana Nuramanah Kinding, D. (2025). Improving Tesseract OCR Accuracy Using SymSpell Algorithm on Passport Data. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 9(1), 374-381. https://doi.org/10.33395/sinkron.v9i1.14395
S. Mammadov, "Neural Spelling Correction for Azerbaijani Language," 2019 IEEE 13th International Conference on Application of Information and Communication Technologies (AICT), Baku, Azerbaijan, 2019, pp. 1-5, doi: 10.1109/AICT47866.2019.8981776.
Ahmadzade, A., & Malekzadeh, S. (2021). Spell Correction for Azerbaijani Language using Deep Neural Networks. ArXiv. https://arxiv.org/abs/2102.03218
Alan Juffs and Ben Naismith. 2025. Identifying and analyzing ‘noisy’ spelling errors in a second language corpus. In Proceedings of the Tenth Workshop on Noisy and User-generated Text, pages 26–37, Albuquerque, New Mexico, USA. Association for Computational Linguistics
Isbarov, J., Huseynova, K., & Rustamov, S. (2024, April). Robust automated spelling correction with deep ensembles. In Proceedings of the 2024 8th International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence (ISMSI), Singapore, Singapore. ACM. https://doi.org/10.1145/3665065.3665070

Automatic spelling correction is critical for enhancing text quality and usability across digital platforms, particularly for morphologically rich and low-resource languages like Azerbaijani. This paper presents a comparative analysis and benchmarking of four prominent spellchecking algorithms—Hunspell, SymSpell, Norvig's probabilistic model, and N-gram statistical models—implemented specifically for Azerbaijani. A comprehensive evaluation was conducted using a manually annotated corpus comprising diverse Azerbaijani text sources, simulating common orthographic errors typical in everyday language usage. Results indicate moderate effectiveness among all tested methods, with Hunspell achieving the highest accuracy (84.5%) due to its robust dictionary-based morphological handling. Despite its speed advantage, SymSpell (81.4% accuracy) requires extensive dictionary resources, making it impractical for morphologically complex languages without significant resource investments. Norvig's method (78.3%) and the N-gram model (82.1%) also demonstrated limitations related to corpus dependency and computational efficiency, respectively. The findings highlight substantial challenges posed by Azerbaijani’s agglutinative structure, underscoring the inadequacy of existing general-purpose algorithms. Consequently, the paper emphasizes the urgent need for new hybrid approaches specifically tailored to Azerbaijani and similarly structured languages, suggesting directions for future research and development in spelling correction technologies.

Keywords : Azerbaijani Language, Spellchecking Algorithms, Hunspell, Symspell. Norvig, N-gram, Agglutinative Languages, NLP

Paper Submission Last Date
30 - June - 2026

SUBMIT YOUR PAPER CALL FOR PAPERS

Video Explanation for Published paper

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.