Authors :
Aram Ibrahim Al-Anazi
Volume/Issue :
Volume 10 - 2025, Issue 9 - September
Google Scholar :
https://tinyurl.com/3taxb7p3
Scribd :
https://tinyurl.com/y59hhukv
DOI :
https://doi.org/10.38124/ijisrt/25sep393
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Note : Google Scholar may take 30 to 40 days to display the article.
Abstract :
This study evaluates the effectiveness of various machine learning and deep learning models in detecting adult
content in Arabic tweets, addressing unique linguistic and cultural challenges. Using a dataset of 33,691 Arabic tweets, we
implemented and compared Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term
Memory networks (LSTM), and AraBERT. The data underwent thorough preprocessing, including cleaning, tokenization,
and segmentation into training, validation, and test sets. Performance metrics such as accuracy, F1 score, and confusion
matrices were used to assess model efficacy. AraBERT achieved the highest accuracy (100%), demonstrating superior
capability in capturing spatial patterns for content classification. CNN and RNN also performed well, with accuracies of
94.27% and 94.22%, respectively, while LSTM achieved an accuracy of 88.37%. These findings highlight AraBERT's
potential for effective content moderation in Arabic digital spaces, contributing to safer online environments.
Keywords :
Arabic Tweets; AraBERT; Convolutional Neural Networks (CNN); Long Short-Term Memory (LSTM); Natural Language Processing (NLP); Recurrent Neural Networks (RNN); Text Classification.
References :
- J. K. Appati, K. Y. Lodonu, and R. Chris-Koka, “A Review of Image Analysis Techniques for Adult Content Detection: Child Protection,” https://services.igi global.com/resolvedoi/resolve.aspx?d oi=10.4018/IJSI.2021040106, vol. 9, no. 2, pp. 102–121, Jan. 1AD, doi: 10.4018/IJSI.2021040106.
- V. M. T. Ochoa, S. Y. Yayilgan, and F. A. Cheikh, “Adult video content detection using machine learning techniques,” 8th International Conference on Signal Image Technology and Internet Based Systems, SITIS 2012r, pp. 967–974, 2012, doi: 10.1109/SITIS.2012.143.
- G. M. Barrientos, R. Alaiz-Rodríguez, V. González-Castro, and A. C. Parnell, “Machine learning techniques for the detection of inappropriate erotic content in text,” International Journal of Computational Intelligence Systems, vol. 13, no. 1, pp. 591–603, Jun. 2020, doi: 10.2991/IJCIS.D.200519.003/METRI CS.
- A. Dubettier, T. Gernot, E. Giguet, and C. Rosenberger, “A Comparative Study of Tools for Explicit Content Detection in Images,” Proceedings - 2023 International Conference on Cyberworlds, CW 2023, pp. 464–471, 2023, doi: 10.1109/CW58918.2023.00077.
- H. Mubarak, A. Rashed, K. Darwish, Y. Samih, and A. Abdelali, “Arabic Offensive Language on Twitter: Analysis and Experiments,” WANLP 2021 - 6th Arabic Natural Language Processing Workshop, Proceedings of the Workshop, pp. 126–135, Apr. 2020, Accessed: Jan. 11, 2025. [Online]. Available: https://arxiv.org/abs/2004.02192v3
- Y. Mao, Q. Liu, and Y. Zhang, “Sentiment analysis methods, applications, and challenges: A systematic literature review,” Journal of King Saud University - Computer and Information Sciences, vol. 36, no. 4, p. 102048, Apr. 2024, doi: 10.1016/J.JKSUCI.2024.102048.
- G. Gajula, A. Hundiwale, S. Mujumdar, and L. R. Saritha, “A machine learning based adult content detection using support vector machine,” Proceedings of the 7th International Conference on Computing for Sustainable Global Development, INDIACom 2020, pp. 181–185, Mar. 2020, doi: 10.23919/INDIACOM49435.2020.90 83700.
- H. Mubarak, S. Hassan, and A. Abdelali, “Adult Content Detection on Arabic Twitter: Analysis and Experiments,” 2021. Accessed: Jan. 11, 2025. [Online]. Available: https://aclanthology.org/2021.wanlp- 1.14/
This study evaluates the effectiveness of various machine learning and deep learning models in detecting adult
content in Arabic tweets, addressing unique linguistic and cultural challenges. Using a dataset of 33,691 Arabic tweets, we
implemented and compared Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term
Memory networks (LSTM), and AraBERT. The data underwent thorough preprocessing, including cleaning, tokenization,
and segmentation into training, validation, and test sets. Performance metrics such as accuracy, F1 score, and confusion
matrices were used to assess model efficacy. AraBERT achieved the highest accuracy (100%), demonstrating superior
capability in capturing spatial patterns for content classification. CNN and RNN also performed well, with accuracies of
94.27% and 94.22%, respectively, while LSTM achieved an accuracy of 88.37%. These findings highlight AraBERT's
potential for effective content moderation in Arabic digital spaces, contributing to safer online environments.
Keywords :
Arabic Tweets; AraBERT; Convolutional Neural Networks (CNN); Long Short-Term Memory (LSTM); Natural Language Processing (NLP); Recurrent Neural Networks (RNN); Text Classification.