Implementation of Random Forest Algorithm for Air Quality Classification: A Case Study of DKI Jakarta's Air Quality Index


Authors : Mochammad Junus; Vidorova Nurcahyani; Rachmad Saptono; Nurefa Maulana; Indra Lukmana Putra; Zidan Fahreza

Volume/Issue : Volume 10 - 2025, Issue 3 - March


Google Scholar : https://tinyurl.com/5dwxp63v

Scribd : https://tinyurl.com/yc4smwmk

DOI : https://doi.org/10.38124/ijisrt/25mar1548

Google Scholar

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.

Note : Google Scholar may take 15 to 20 days to display the article.


Abstract : Air quality monitoring and classification in urban environments present significant challenges for environmental management and public health policy. This study implements an optimized Random Forest (RF) algorithm to classify air quality levels in DKI Jakarta, Indonesia, using the Air Quality Index (AQI) data from 2021. The analysis incorporates six key pollutants: PM10, PM2.5, NO2, SO2, CO, and O3, with data collected from the Environmental Management Agency of DKI Jakarta. The RF model was developed using 5000 decision trees with optimized parameters (mtry=2) and evaluated through stratified sampling with a 70:30 train-test split. The model achieved an exceptional accuracy of 99.09% with a low Out-of-Bag (OOB) error rate of 2.35%. Feature importance analysis revealed that particulate matter (PM2.5 and PM10) were the most influential factors, collectively accounting for 78.70% of the model's decision-making process. The high performance metrics across all air quality categories (Good, Moderate, and Unhealthy) demonstrate the model's reliability in classification tasks. This research provides insights into environmental monitoring and policymaking, presenting a framework adaptable to other urban settings. The findings highlight the crucial role of particulate matter in air quality assessment and suggest targeted strategies for pollution control.

Keywords : Air Quality Classification, Random Forest, Machine Learning, Air Quality Index, Environmental Monitoring, Jakarta.

References :

  1. Amazing Hope Ekeh, Charles Elachi Apeh, Chinekwu Somtochukwu Odionu, & Blessing Austin-Gabriel. (2025). Leveraging machine learning for environmental policy innovation: Advances in Data Analytics to address urban and ecological challenges. Gulf Journal of Advance Business Research, 3(2), 456–482. https://doi.org/10.51594/gjabr.v3i2.92
  2. Beucler, T., Gentine, P., Yuval, J., Gupta, A., Peng, L., Lin, J., Yu, S., Rasp, S., Ahmed, F., O’gorman, P. A., Neelin, J. D., Lutsko, N. J., & Pritchard, M. (2024). Climate-invariant machine learning. In Sci. Adv (Vol. 10). https://www.science.org
  3. Natarajan, S. K., Shanmurthy, P., Arockiam, D., Balusamy, B., & Selvarajan, S. (2024). Optimized machine learning model for air quality index prediction in major cities in India. Scientific Reports, 14(1). https://doi.org/10.1038/s41598-024-54807-1
  4. Rakholia, R., Le, Q., Vu, K., Ho, B. Q., & Carbajo, R. S. (2024). Accurate PM2.5 urban air pollution forecasting using multivariate ensemble learning Accounting for evolving target distributions. Chemosphere, 364. https://doi.org/10.1016/j.chemosphere.2024.143097
  5. Shaziayani, W. N., Ul-Saufie, A. Z., Mutalib, S., Mohamad Noor, N., & Zainordin, N. S. (2022). Classification Prediction of PM10 Concentration Using a Tree-Based Machine Learning Approach. Atmosphere, 13(4). https://doi.org/10.3390/atmos13040538
  6. Syuhada, G., Akbar, A., Hardiawan, D., Pun, V., Darmawan, A., Heryati, S. H. A., Siregar, A. Y. M., Kusuma, R. R., Driejana, R., Ingole, V., Kass, D., & Mehta, S. (2023). Impacts of Air Pollution on Health and Cost of Illness in Jakarta, Indonesia. International Journal of Environmental Research and Public Health, 20(4). https://doi.org/10.3390/ijerph20042916
  7. WHO. (2021, September 22). WHO global air quality guidelines: particulate matter (‎PM2.5 and PM10)‎, ozone, nitrogen dioxide, sulfur dioxide and carbon monoxide.
  8. Zuo, X., Yang, X., Dou, Z., & Wen, J. R. (2019). RUCIR at TREC 2019: Conversational Assistance Track. 28th Text REtrieval Conference, TREC 2019 - Proceedings. https://doi.org/10.1145/1122445.1122456
  9. Azies, H. A. (n.d.). Air Pollution in Jakarta, Indonesia Under Spotlight: An AI-Assisted Semi-Supervised Learning Approach.
  10. Idroes, G. M., Noviandy, T. R., Maulana, A., Zahriah, Z., Suhendrayatna, S., Suhartono, E., Khairan, K., Kusumo, F., Helwani, Z., & Abd Rahman, S. (2023). Urban Air Quality Classification Using Machine Learning Approach to Enhance Environmental Monitoring. Leuser Journal of Environmental Studies, 1(2), 62–68. https://doi.org/10.60084/ljes.v1i2.99
  11. Jayadi, B. V., Lauro, M. D., Rusdi, Z., Handhayani, T., & Informasi, F. T. (n.d.). Sistemasi: Jurnal Sistem Informasi Klasifikasi Indeks Standar Pencemaran Udara untuk Data Tidak Seimbang menggunakan Pendekatan Pembelajaran Mesin Air Quality Index Classification for Imbalanced Data Using Machine Learning Approach. http://sistemasi.ftik.unisi.ac.id
  12. Kusuma, W. L., Chih-Da, W., Yu-Ting, Z., Hapsari, H. H., & Muhamad, J. L. (2019). Pm2.5 pollutant in asia—a comparison of metropolis cities in indonesia and taiwan. International Journal of Environmental Research and Public Health, 16(24). https://doi.org/10.3390/ijerph16244924
  13. Syuhada, G., Akbar, A., Hardiawan, D., Pun, V., Darmawan, A., Heryati, S. H. A., Siregar, A. Y. M., Kusuma, R. R., Driejana, R., Ingole, V., Kass, D., & Mehta, S. (2023). Impacts of Air Pollution on Health and Cost of Illness in Jakarta, Indonesia. International Journal of Environmental Research and Public Health, 20(4). https://doi.org/10.3390/ijerph20042916
  14. V. Vu, T., Shi, Z., Cheng, J., Zhang, Q., He, K., Wang, S., & M. Harrison, R. (2019). Assessing the impact of clean air action on air quality trends in Beijing using a machine learning technique. Atmospheric Chemistry and Physics, 19(17), 11303–11314. https://doi.org/10.5194/acp-19-11303-2019
  15. Zulfikri, A. (2023). Effects of Pollution and Transportation on Public Health in Jakarta. In West Science Interdisciplinary Studies (Vol. 1, Issue 04).

Air quality monitoring and classification in urban environments present significant challenges for environmental management and public health policy. This study implements an optimized Random Forest (RF) algorithm to classify air quality levels in DKI Jakarta, Indonesia, using the Air Quality Index (AQI) data from 2021. The analysis incorporates six key pollutants: PM10, PM2.5, NO2, SO2, CO, and O3, with data collected from the Environmental Management Agency of DKI Jakarta. The RF model was developed using 5000 decision trees with optimized parameters (mtry=2) and evaluated through stratified sampling with a 70:30 train-test split. The model achieved an exceptional accuracy of 99.09% with a low Out-of-Bag (OOB) error rate of 2.35%. Feature importance analysis revealed that particulate matter (PM2.5 and PM10) were the most influential factors, collectively accounting for 78.70% of the model's decision-making process. The high performance metrics across all air quality categories (Good, Moderate, and Unhealthy) demonstrate the model's reliability in classification tasks. This research provides insights into environmental monitoring and policymaking, presenting a framework adaptable to other urban settings. The findings highlight the crucial role of particulate matter in air quality assessment and suggest targeted strategies for pollution control.

Keywords : Air Quality Classification, Random Forest, Machine Learning, Air Quality Index, Environmental Monitoring, Jakarta.

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe