Detection of Adult Content in Arabic Tweets Using Machine Learning Models


Authors : Aram Ibrahim Al-Anazi

Volume/Issue : Volume 10 - 2025, Issue 9 - September


Google Scholar : https://tinyurl.com/3taxb7p3

Scribd : https://tinyurl.com/y59hhukv

DOI : https://doi.org/10.38124/ijisrt/25sep393

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.

Note : Google Scholar may take 30 to 40 days to display the article.


Abstract : This study evaluates the effectiveness of various machine learning and deep learning models in detecting adult content in Arabic tweets, addressing unique linguistic and cultural challenges. Using a dataset of 33,691 Arabic tweets, we implemented and compared Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory networks (LSTM), and AraBERT. The data underwent thorough preprocessing, including cleaning, tokenization, and segmentation into training, validation, and test sets. Performance metrics such as accuracy, F1 score, and confusion matrices were used to assess model efficacy. AraBERT achieved the highest accuracy (100%), demonstrating superior capability in capturing spatial patterns for content classification. CNN and RNN also performed well, with accuracies of 94.27% and 94.22%, respectively, while LSTM achieved an accuracy of 88.37%. These findings highlight AraBERT's potential for effective content moderation in Arabic digital spaces, contributing to safer online environments.

Keywords : Arabic Tweets; AraBERT; Convolutional Neural Networks (CNN); Long Short-Term Memory (LSTM); Natural Language Processing (NLP); Recurrent Neural Networks (RNN); Text Classification.

References :

  1. J. K. Appati, K. Y. Lodonu, and R. Chris-Koka, “A Review of Image Analysis Techniques for Adult Content Detection: Child Protection,” https://services.igi global.com/resolvedoi/resolve.aspx?d oi=10.4018/IJSI.2021040106, vol. 9, no. 2, pp. 102–121, Jan. 1AD, doi: 10.4018/IJSI.2021040106.
  2. V. M. T. Ochoa, S. Y. Yayilgan, and F. A. Cheikh, “Adult video content detection using machine learning techniques,” 8th International Conference on Signal Image Technology and Internet Based Systems, SITIS 2012r, pp. 967–974, 2012, doi: 10.1109/SITIS.2012.143.
  3. G. M. Barrientos, R. Alaiz-Rodríguez, V. González-Castro, and A. C. Parnell, “Machine learning techniques for the detection of inappropriate erotic content in text,” International Journal of Computational Intelligence Systems, vol. 13, no. 1, pp. 591–603, Jun. 2020, doi: 10.2991/IJCIS.D.200519.003/METRI CS.
  4. A. Dubettier, T. Gernot, E. Giguet, and C. Rosenberger, “A Comparative Study of Tools for Explicit Content Detection in Images,” Proceedings - 2023 International Conference on Cyberworlds, CW 2023, pp. 464–471, 2023, doi: 10.1109/CW58918.2023.00077.
  5. H. Mubarak, A. Rashed, K. Darwish, Y. Samih, and A. Abdelali, “Arabic Offensive Language on Twitter: Analysis and Experiments,” WANLP 2021 - 6th Arabic Natural Language Processing Workshop, Proceedings of the Workshop, pp. 126–135, Apr. 2020, Accessed: Jan. 11, 2025. [Online]. Available: https://arxiv.org/abs/2004.02192v3
  6. Y. Mao, Q. Liu, and Y. Zhang, “Sentiment analysis methods, applications, and challenges: A systematic literature review,” Journal of King Saud University - Computer and Information Sciences, vol. 36, no. 4, p. 102048, Apr. 2024, doi: 10.1016/J.JKSUCI.2024.102048.
  7. G. Gajula, A. Hundiwale, S. Mujumdar, and L. R. Saritha, “A machine learning based adult content detection using support vector machine,” Proceedings of the 7th International Conference on Computing for Sustainable Global Development, INDIACom 2020, pp. 181–185, Mar. 2020, doi: 10.23919/INDIACOM49435.2020.90 83700.
  8. H. Mubarak, S. Hassan, and A. Abdelali, “Adult Content Detection on Arabic Twitter: Analysis and Experiments,” 2021. Accessed: Jan. 11, 2025. [Online]. Available: https://aclanthology.org/2021.wanlp- 1.14/

This study evaluates the effectiveness of various machine learning and deep learning models in detecting adult content in Arabic tweets, addressing unique linguistic and cultural challenges. Using a dataset of 33,691 Arabic tweets, we implemented and compared Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory networks (LSTM), and AraBERT. The data underwent thorough preprocessing, including cleaning, tokenization, and segmentation into training, validation, and test sets. Performance metrics such as accuracy, F1 score, and confusion matrices were used to assess model efficacy. AraBERT achieved the highest accuracy (100%), demonstrating superior capability in capturing spatial patterns for content classification. CNN and RNN also performed well, with accuracies of 94.27% and 94.22%, respectively, while LSTM achieved an accuracy of 88.37%. These findings highlight AraBERT's potential for effective content moderation in Arabic digital spaces, contributing to safer online environments.

Keywords : Arabic Tweets; AraBERT; Convolutional Neural Networks (CNN); Long Short-Term Memory (LSTM); Natural Language Processing (NLP); Recurrent Neural Networks (RNN); Text Classification.

CALL FOR PAPERS


Paper Submission Last Date
31 - December - 2025

Video Explanation for Published paper

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe