Authors :
Israfil Gasim
Volume/Issue :
Volume 10 - 2025, Issue 5 - May
Google Scholar :
https://tinyurl.com/54dvbkfz
DOI :
https://doi.org/10.38124/ijisrt/25may1640
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Abstract :
Automatic spelling correction is critical for enhancing text quality and usability across digital platforms,
particularly for morphologically rich and low-resource languages like Azerbaijani. This paper presents a comparative
analysis and benchmarking of four prominent spellchecking algorithms—Hunspell, SymSpell, Norvig's probabilistic model,
and N-gram statistical models—implemented specifically for Azerbaijani. A comprehensive evaluation was conducted using
a manually annotated corpus comprising diverse Azerbaijani text sources, simulating common orthographic errors typical
in everyday language usage. Results indicate moderate effectiveness among all tested methods, with Hunspell achieving the
highest accuracy (84.5%) due to its robust dictionary-based morphological handling. Despite its speed advantage, SymSpell
(81.4% accuracy) requires extensive dictionary resources, making it impractical for morphologically complex languages
without significant resource investments. Norvig's method (78.3%) and the N-gram model (82.1%) also demonstrated
limitations related to corpus dependency and computational efficiency, respectively. The findings highlight substantial
challenges posed by Azerbaijani’s agglutinative structure, underscoring the inadequacy of existing general-purpose
algorithms. Consequently, the paper emphasizes the urgent need for new hybrid approaches specifically tailored to
Azerbaijani and similarly structured languages, suggesting directions for future research and development in spelling
correction technologies.
Keywords :
Azerbaijani Language, Spellchecking Algorithms, Hunspell, Symspell. Norvig, N-gram, Agglutinative Languages, NLP
References :
- Microsoft Research, "Speller100: Zero-shot spelling correction at scale for 100-plus languages," unpublished, 2021. [Online]. Available: https://www.microsoft.com/en-us/research/blog/speller100-zero-shot-spelling-correction-at-scale-for-100-plus-languages. [Accessed May 10, 2024].
- Markus Näther. 2020. An In-Depth Comparison of 14 Spelling Correction Tools on a Common Benchmark. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 1849–1857, Marseille, France. European Language Resources Association
- Had, I. S. ., Maulana Baihaqi, W., & Putriana Nuramanah Kinding, D. (2025). Improving Tesseract OCR Accuracy Using SymSpell Algorithm on Passport Data. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 9(1), 374-381. https://doi.org/10.33395/sinkron.v9i1.14395
- S. Mammadov, "Neural Spelling Correction for Azerbaijani Language," 2019 IEEE 13th International Conference on Application of Information and Communication Technologies (AICT), Baku, Azerbaijan, 2019, pp. 1-5, doi: 10.1109/AICT47866.2019.8981776.
- Ahmadzade, A., & Malekzadeh, S. (2021). Spell Correction for Azerbaijani Language using Deep Neural Networks. ArXiv. https://arxiv.org/abs/2102.03218
- Alan Juffs and Ben Naismith. 2025. Identifying and analyzing ‘noisy’ spelling errors in a second language corpus. In Proceedings of the Tenth Workshop on Noisy and User-generated Text, pages 26–37, Albuquerque, New Mexico, USA. Association for Computational Linguistics
- Isbarov, J., Huseynova, K., & Rustamov, S. (2024, April). Robust automated spelling correction with deep ensembles. In Proceedings of the 2024 8th International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence (ISMSI), Singapore, Singapore. ACM. https://doi.org/10.1145/3665065.3665070
Automatic spelling correction is critical for enhancing text quality and usability across digital platforms,
particularly for morphologically rich and low-resource languages like Azerbaijani. This paper presents a comparative
analysis and benchmarking of four prominent spellchecking algorithms—Hunspell, SymSpell, Norvig's probabilistic model,
and N-gram statistical models—implemented specifically for Azerbaijani. A comprehensive evaluation was conducted using
a manually annotated corpus comprising diverse Azerbaijani text sources, simulating common orthographic errors typical
in everyday language usage. Results indicate moderate effectiveness among all tested methods, with Hunspell achieving the
highest accuracy (84.5%) due to its robust dictionary-based morphological handling. Despite its speed advantage, SymSpell
(81.4% accuracy) requires extensive dictionary resources, making it impractical for morphologically complex languages
without significant resource investments. Norvig's method (78.3%) and the N-gram model (82.1%) also demonstrated
limitations related to corpus dependency and computational efficiency, respectively. The findings highlight substantial
challenges posed by Azerbaijani’s agglutinative structure, underscoring the inadequacy of existing general-purpose
algorithms. Consequently, the paper emphasizes the urgent need for new hybrid approaches specifically tailored to
Azerbaijani and similarly structured languages, suggesting directions for future research and development in spelling
correction technologies.
Keywords :
Azerbaijani Language, Spellchecking Algorithms, Hunspell, Symspell. Norvig, N-gram, Agglutinative Languages, NLP