Authors :
Abba Abdullahi Ibrahim; Hadiza Umar Ali; Ismail Zahraddeen Yakubu; Ibrahim A. Lawal
Volume/Issue :
Volume 9 - 2024, Issue 10 - October
Google Scholar :
https://tinyurl.com/3mhya86w
Scribd :
https://tinyurl.com/2zduuyvd
DOI :
https://doi.org/10.38124/ijisrt/IJISRT24OCT1050
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Abstract :
Fake news poses a significant threat to
societies worldwide, including in Hausa-speaking
regions, where misinformation is rapidly disseminated
via social media. The lack of NLP resources tailored to
this language exacerbated the problem of fake news in
the Hausa language. While extensive research has been
conducted on counterfeit news detection in languages
such as English, little attention has been paid to
languages like Hausa, leaving a significant portion of
the global population vulnerable to misinformation.
Traditional machine-learning approaches often fail to
perform well in low-resource settings due to insufficient
training data and linguistic resources. This study aims
to develop a robust model for detecting fake news in the
Hausa language by leveraging transfer learning
techniques with adaptive fine-tuning. A dataset of over
6,600 news articles, including both fake and truthful
articles, was collected from various sources between
January 2022 and December 2023. Cross-lingual
transfer Learning (XLT) was employed to adapt pre-
trained models for the low-resource Hausa language.
The model was fine-tuned and evaluated using
performance metrics such as accuracy, precision, recall,
F-score, AUC-ROC, and PR curves. Results
demonstrated a high accuracy rate in identifying fake
news, with significant improvements in detecting
misinformation within political and world news
categories. This study addresses the gap in Hausa-
language natural language processing (NLP) and
contributes to the fight against misinformation in
Nigeria. The findings are relevant for developing AI-
driven tools to curb fake news dissemination in African
languages.
Keywords :
Fake News; NLP; Deep Learning; Transfer Learning; Fine Tuning.
References :
- Adelani, D. I., Adebonojo, D. E., & Olabisi, O. A. (2020). Developing and curating NLP resources for African languages: A review of the challenges and future directions. Journal of African Languages and Linguistics, 41(2), 325-341.
- Adelani, D. I., Alabi, J., Owoputi, E., & Adebara, I. (2021). Transfer learning for low-resource African languages: A study on text classification for Yoruba, Wolof, Hausa, and Swahili. Proceedings of the 35th AAAI Conference on Artificial Intelligence, 12878-12885.
- Adebara, I., Muhammad, A., & Adelani, D. I. (2021). Low-resource NLP for African languages: A case study on Hausa and its dialects. Proceedings of the 3rd Workshop on AfricaNLP, 20-29.
- Akintunde, A. T., & Musa, A. B. (2020). Misinformation and disinformation during COVID-19 pandemic: A study of Hausa speaking people. Journal of African Media Studies, 12(4), 521-536. https://doi.org/10.1386/jams_00040_1
- Allcott, H., & Gentzkow, M. (2017). Social media and fake news in the 2016 election. Journal of Economic Perspectives, 31(2), 211-236. https://doi.org/10.1257/jep.31.2.211
- Aliyu, M. A., Ahmed, M., & Abdulrahman, M. (2023). Challenges and opportunities in NLP for African languages: The case of Hausa. Journal of African Languages and Linguistics, 44(2), 215-230. https://doi.org/10.1111/jal.12345
- Arun, C. (2019). On WhatsApp, rumors, and lynchings. Economic & Political Weekly, 54(6), 30-35.
- Bird, S. (2020). Decolonising speech and language technology. Proceedings of the 28th International Conference on Computational Linguistics, 12-15.
- Chesney, R., & Citron, D. (2019). Deepfakes and the new disinformation war: The coming age of post-truth geopolitics. Foreign Affairs, 98(1), 147-155.
- Cieri, C., DiPersio, D., Graff, D., Hughes, B., & Maeda, K. (2021). Low-resource language corpus creation: Challenges and solutions. Language Resources and Evaluation, 55(2), 385-406.
- Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., ... & Stoyanov, V. (2020). Unsupervised cross-lingual representation learning at scale. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 8440-8451.
- Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., ... & Stoyanov, V. (2020). Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116.
- Conroy, N. K., Rubin, V. L., & Chen, Y. (2015). Automatic deception detection: Methods for finding fake news. Proceedings of the Association for Information Science and Technology, 52(1), 1-4.
- Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Gacko, D., Raina, S., Asai, A., Peng, H., Liu, Q., & Radev, D. (2022). Dialectal diversity in low-resource languages: Challenges and solutions. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 4597-4607.
- Ghosh, S., Chorowski, J., & Clark, K. (2021). Why does my classifier fail? Understanding the impact of label noise on machine learning models. arXiv preprint arXiv:2104.11923.
- Howard, J., & Ruder, S. (2018). Universal language model fine-tuning for text classification. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 328-339.
- Islam, M. S., Sarkar, T., Khan, S. H., Kamal, A. H. M., Hasan, S. M. M., Kabir, A., ... & Seale, H. (2020). COVID-19–related infodemic and its impact on public health: A global social media analysis. The American Journal of Tropical Medicine and Hygiene, 103(4), 1621-1629. https://doi.org/10.4269/ajtmh.20-0812
- Iwuoha, V. C., & Aniche, E. T. (2020). Fake news and democratic process in Nigeria: An exploratory study. Media, War & Conflict, 13(4), 430-450. https://doi.org/10.1177/1750635219839394
- Joshi, P., Santy, S., Budhiraja, A., Bali, K., & Choudhury, M. (2020). The state and fate of linguistic diversity and inclusion in the NLP world. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 6282-6293.
- Kakwani, A., Khanuja, S., Kumar, V., Mathur, P., Kunchukuttan, A., & Bhattacharyya, P. (2020). IndicNLP suite: Monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for Indian languages. Proceedings of the 28th International Conference on Computational Linguistics, 3086-3101.
- Kumar, S., Asthana, S., Upadhyay, P., Ahmad, F., & Varma, V. (2020). No training required: Exploring cross-lingual transfer for unsupervised bilingual lexicon induction. Proceedings of the 28th International Conference on Computational Linguistics, 3201-3211.
- Lazer, D. M., Baum, M. A., Benkler, Y., Berinsky, A. J., Greenhill, K. M., Menczer, F., ... & Zittrain, J. L. (2018). The science of fake news. Science, 359(6380), 1094-1096. https://doi.org/10.1126/science.aao2998
- Lewis, P., Ghahramani, Z., & Wong, S. C. (2020). Transfer learning in natural language processing. Nature Machine Intelligence, 2(8), 412-424.
- Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Lewis, M. (2019). RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692.
- Martinus, L., & Abbott, J. (2019). A focus on low-resource languages: Natural language processing for African languages. Proceedings of the 2019 International Conference on Computational Linguistics, 1024-1035.
- Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
- Nguyen, D. T., Do, P. Q., & Vu, T. H. (2021). A survey on low-resource neural machine translation: Research challenges and future directions. Journal of Computational Linguistics, 47(3), 567-599. https://doi.org/10.1162/coli_a_00412
- Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345-1359.
- Pennycook, G., & Rand, D. G. (2018). The Implied Truth Effect: Attaching warnings to a subset of fake news stories increases perceived accuracy of stories without warnings. Management Science, 66(11), 4944-4957. https://doi.org/10.1287/mnsc.2019.3478
- Pires, T., Schlinger, E., & Garrette, D. (2019). How multilingual is multilingual BERT? arXiv preprint arXiv:1906.01502.
- Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Blog, 1(8), 9.
- Rashkin, H., Choi, E., Jang, J. Y., Volkova, S., & Choi, Y. (2017). Truth of varying shades: Analyzing language in fake news and political fact-checking. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2931-2937.
- Schuster, R., Gusev, Y., & Jurgens, D. (2019). The limitations of cross-lingual contextualized word representations for specialized knowledge. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, 1063-1074.
- Siddharthan, A., Bharti, N., & Gill, S. (2020). Misinformation and disinformation in multilingual contexts: A case study on fake news detection for African languages. Proceedings of the Workshop on NLP for Low-Resource Languages, 123-131.
- Sun, S., Qin, K., Zhou, Y., & Wang, X. (2019). A survey on transfer learning for natural language processing. Journal of Artificial Intelligence Research, 70(1), 1-20.
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30(1), 5998-6008.
- Zellers, R., Holtzman, A., Bisk, Y., Farhadi, A., & Choi, Y. (2019). Defending against neural fake news. Advances in Neural Information Processing Systems, 32(1), 9051-9062.
- Zhou, X., & Zafarani, R. (2020). Fake news: A survey of research, detection methods, and opportunities. ACM Computing Surveys, 53(5), 1-40.
Fake news poses a significant threat to
societies worldwide, including in Hausa-speaking
regions, where misinformation is rapidly disseminated
via social media. The lack of NLP resources tailored to
this language exacerbated the problem of fake news in
the Hausa language. While extensive research has been
conducted on counterfeit news detection in languages
such as English, little attention has been paid to
languages like Hausa, leaving a significant portion of
the global population vulnerable to misinformation.
Traditional machine-learning approaches often fail to
perform well in low-resource settings due to insufficient
training data and linguistic resources. This study aims
to develop a robust model for detecting fake news in the
Hausa language by leveraging transfer learning
techniques with adaptive fine-tuning. A dataset of over
6,600 news articles, including both fake and truthful
articles, was collected from various sources between
January 2022 and December 2023. Cross-lingual
transfer Learning (XLT) was employed to adapt pre-
trained models for the low-resource Hausa language.
The model was fine-tuned and evaluated using
performance metrics such as accuracy, precision, recall,
F-score, AUC-ROC, and PR curves. Results
demonstrated a high accuracy rate in identifying fake
news, with significant improvements in detecting
misinformation within political and world news
categories. This study addresses the gap in Hausa-
language natural language processing (NLP) and
contributes to the fight against misinformation in
Nigeria. The findings are relevant for developing AI-
driven tools to curb fake news dissemination in African
languages.
Keywords :
Fake News; NLP; Deep Learning; Transfer Learning; Fine Tuning.