⚠ Official Notice: www.ijisrt.com is the official website of the International Journal of Innovative Science and Research Technology (IJISRT) Journal for research paper submission and publication. Please beware of fake or duplicate websites using the IJISRT name.



Bivariate and Multivariate Vibe Analysis of Forestry Data with AI and R Statistics


Authors : Kato Samuel Namuene; Egbe Enow Andrew

Volume/Issue : Volume 11 - 2026, Issue 4 - April


Google Scholar : https://tinyurl.com/9jw59tdf

Scribd : https://tinyurl.com/2hh5kr2y

DOI : https://doi.org/10.38124/ijisrt/26apr1365

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.


Abstract : Artificial Intelligence (AI) can be used to speed up data analysis with R statistics by generating R code which is executed in R (vibe data analysis), reducing the time a manual data analyst takes to develop R code. A reproducible, AIassisted framework for bivariate and multivariate statistical analysis of forestry count data was developed and validated in this study, integrating vibe data analysis with conventional manual methods using four disturbance observations (snapping, windthrow, branch fall, and dead standing) across 73 species drawn from 183 treefall gaps in Korup National Park, Cameroon. Using Claude.ai to generate R statistical code through structured prompt engineering, we systematically applied classical parametric approaches alongside non-parametric alternatives across five analytical stages: exploratory data analysis, bivariate correlation and regression, multivariate correlation matrix analysis, dimensionality reduction and clustering, and multiple linear regression. All disturbance count variables exhibited extreme positive skewness (1.776- 8.367) and severe excess kurtosis (5.554-71.014), fundamentally violating parametric assumptions and designating nonparametric methods as co-primary analytical tools. The bivariate analysis revealed a strong positive association between snapping and gap size (Pearson r = 0.865, p < 0.001; R² = 0.7483), corroborated by non-parametric methods (Spearman ρ = 0.455, p < 0.001; Kendall τ = 0.366, p < 0.001), indicating that species associated with larger canopy openings tend to record higher snapping frequencies.

Keywords : Bivariate Analysis, Multivariate Analysis, Correlation, Regression, PCA, Cluster Analysis, K-means, Vibe Data Analysis, R Statistics, Artificial Intelligence, Ecological Disturbance Data.

References :

  1. Abdi, H., & Williams, L. J. (2010). Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2(4), 433–459. https://doi.org/10.1002/wics.101
  2. Ahuja, K., Diddee, H., Hada, R., Ochieng, M., Ramesh, K., Jain, P., Nambi, A., Ganu, T., Segal, S., Axmed, M., Bali, K., & Sitaram, S. (2023). MEGA: Multilingual evaluation of generative AI (Version 4). arXiv. https://doi.org/10.48550/arXiv.2303.12528
  3. Alkaissi, H., & McFarlane, S. I. (2023). Artificial hallucinations in ChatGPT: implications in scientific writing. Cureus, 15(2), e35179. https://doi.org/10.7759/cureus.35179
  4. Anderson, M. J. (2008). A new method for non-parametric multivariate analysis of variance. Austral Ecology, 26(1), 32–46. https://doi.org/10.1111/j.1442-9993.2001.01070.pp.x
  5. Anthropic. (2026). Claude (3.5 Sonnet version) [Large language model]. Available at: https://claude.ai/
  6. Barke, S., James, M. B., & Polikarpova, N. (2023). Grounded Copilot: How Programmers Interact with Code-Generating Models. Proceedings of the ACM on Programming Languages, 7, 85-111. 
    https://doi.org/10.1145/3586030
  7. Baumer, B., Cetinkaya-Rundel, M., Bray, A., Loi, L., & Horton, N. J. (2014). R Markdown: Integrating a reproducible analysis tool into introductory statistics. Technology Innovations in Statistics Education, 8(1), 1-29. https://doi.org/10.5070/T581020118
  8. Bonnini, S., Assegie, G. M., & Trzcinska, K. (2024). Review about the permutation approach in hypothesis testing. Mathematics, 12(17), 2617. https://doi.org/10.3390/math12172617
  9. Borcard, D., Gillet, F., & Legendre, P. (2018). Numerical ecology with R (2nd ed.). Springer. 435pp. https://doi.org/10.1007/978-3-319-71404-2
  10. Brokaw, N.V.L. (1985). Gap-phase regeneration in a tropical forest. Ecology, 66(3), 682-687.
  11. Everitt, B. S., Landau, S., Leese, M., & Stahl, D. (2011). Cluster analysis (5th ed.). Wiley. https://doi.org/10.1002/9780470977811
  12. Federiakin, D., Molerov, D., Zlatkin-Troitschanskaia, O., & Maur, A. (2024). Prompt engineering as a new 21st century skill. Frontiers in Education, 9, 1366434. https://doi.org/10.3389/feduc.2024.1366434
  13. Floridi, L., Cowls, J., King, T. C., & Taddeo, M. (2020). How to design AI for social good: Seven essential factors. Springer Nature, 26(3), 1771–1796. https://doi.org/10.1007/s11948-020-00213-5
  14. Forrester, D. I., & Tang, X. (2016). Analysing the spatial and temporal dynamics of species interactions in mixed-species forests and the effects of stand density using the 3-PGmix model. Ecological Modelling, 319, 233–254. https://doi.org/10.1016/j.ecolmodel.2015.07.010
  15. Fox, J., & Weisberg, S. (2019). An R companion to applied regression (3rd ed.). Sage Publishing. 802pp. https://www.scribd.com/document/434845005/Companion-Applied-Regression-R
  16. Friendly, M. (2002). Corrgrams: Exploratory displays for correlation matrices. The American Statistician, 56(4), 316–324.
  17. Gelman, A., & Loken, E. (2014). The statistical crisis in science. American Scientist, 102(6), 460–465. https://doi.org/10.1511/2014.111.460
  18. Guo, B., Zhang, X., Wang, Z., Jiang, M., Nie, J., Ding, Y., Yue, J., & Wu, Y. (2023). How close is ChatGPT to human experts? Comparison corpus, evaluation, and detection. arXiv:2301.07597. https://doi.org/10.48550/arXiv.2301.07597
  19. Hampton, S. E., Jones, M. B., Wasser, L. A., Schildhauer, M. P., Supp, S. R., Brun, J., Hernandez, R. R., Boettiger, C., Collins, S. L., Gross, L. J., Fernández, D. S., Budden, A., White, E. P., Teal, T. K., Labou, S. G., & Aukema, J. E. (2017). Skills and knowledge for data-intensive environmental researchBioScience, 67(6), 546–557. https://doi.org/10.1093/biosci/bix025
  20. Hellas, A., Leinonen, J., Sarsa, S., Koutcheme, C., Kujanpää, L., & Sorva, J. (2023). Exploring the responses of large language models to beginner programmers’ help requests. In Proceedings of the 2023 ACM Conference on International Computing Education Research (pp. 93–105). Association for Computing Machinery. https://doi.org/10.1145/3568813.3600139
  21. Hesselbarth, M. H. K., Sciaini, M., With, K. A., Wiegand, K., & Nowosad, J. (2019). landscapemetrics: An open-source R tool to calculate landscape metrics. Ecography, 42, 1648–1657. https://doi.org/10.1111/ecog.04617
  22. Huang, J., & Chang, K. C.-C. (2023). Towards reasoning in large language models: A survey. Findings of the Association for Computational Linguistics: ACL 2023, 1049–1065. https://doi.org/10.18653/v1/2023.findings-acl.67
  23. Ives, A. R. (2015). For testing the significance of regression coefficients, go ahead and log-transform count data. Methods in Ecology and Evolution, 6(7), 828–835. https://doi.org/10.1111/2041-210X.12386
  24. Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., & others. (2023). Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12), https://doi.org/10.1145/3571730
  25. Jolliffe, I.T. and Cadima, J. (2016) Principal Component Analysis: A Review and Recent Developments. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 374, Article 20150202. 
    https://doi.org/10.1098/rsta.2015.0202.
  26. Kaiser, H. F. (1960). The application of electronic computers to factor analysis. Educational and Psychological Measurement, 20, 141-151. http://dx.doi.org/10.1177/001316446002000116
  27. Kassambara, A. (2017). Practical guide to cluster analysis in R: Unsupervised machine learning. STHDA. https://www.datanovia.com/en/product/practical-guide-to-cluster-analysis-in-r/
  28. Kassambara, A., Mundt, F., & Erdey, L. (2026). factoextra: Extract and visualize the results of multivariate data analyses(Version 2.0.0) [R package]. Comprehensive R Archive Network (CRAN). https://doi.org/10.32614/CRAN.package.factoextra
  29. Kumar, S. S., Lones, M. A., Maarek, M., & Zantout, H. (2024). Investigating the proficiency of large language models in formative feedback generation for student programmers. In Proceedings of the 1st International Workshop on Large Language Models for Code (LLM4Code ’24) (pp. 88–93). Association for Computing Machinery. https://doi.org/10.1145/3643795.3648380
  30. Lai, J., Lortie, C. J., Muenchen, R. A., Yang, J., & Ma, K. (2019). Evaluating the popularity of R in ecology. Ecosphere, 10(1), e02567. https://doi.org/10.1002/ecs2.2567
  31. Lê, S., Josse, J., & Husson, F. (2008). FactoMineR: An R package for multivariate analysis. Journal of Statistical Software, 25(1), 1–18. https://doi.org/10.18637/jss.v025.i01
  32. Legendre, P. (2019). Numerical ecology. In B. Fath (Ed.), Encyclopedia of Ecology (2nd ed., Vol. 3, pp. 487–493). Elsevier. https://doi.org/10.1016/B978-0-12-409548-9.10595-0
  33. Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9), 1–35. https://doi.org/10.1145/3560815
  34. Marvin, G., Hellen, N., Jjingo, D., & Nakatumba-Nabende, J. (2024). Prompt engineering in large language models. In International Conference on Data Intelligence and Cognitive Informatics (pp. 387–402). Springer. https://doi.org/10.1007/978-981-99-7962-2_30
  35. McCune, B. and Grace, J.B. (2002) Analysis of Ecological Communities. MjM Software Design, Gleneden Beach. 304pp
  36. Michener, W. K. (2015). Ten simple rules for creating a good data management plan. PLOS Computational Biology, 11(10), e1004525. https://doi.org/10.1371/journal.pcbi.1004525
  37. Mielke, P. W., & Berry, K. J. (2001). Permutation methods: A distance function approach. Springer. https://doi.org/10.1007/978-1-4757-3449-2
  38. Miller, G., & Spiegel, E. (2025). Guidelines for research data integrity (GRDI). Science Data, 12(1), 95. https://doi.org/10.1038/s41597-024-04312-x
  39. Negron-Juarez, R., Feng, Y., Sheil, D., Keller, M., Ordway, E. M., Marra, D. M., & Urquiza-Muñoz, J. D. (2026). Widespread forest disturbance from windthrow in central African rainforests. npj Natural Hazards, 3, Article 21. https://doi.org/10.1038/s44304-026-00188-6
  40. Nori, H., Lee, Y. T., Zhang, S., Carignan, D., Edgar, R., Fusi, N., King, N., Larson, J., Li, Y., Liu, W., Luo, R., McKinney, S. M., Ness, R. O., Poon, H, Qin, T., Usuyama, N, White, C. and Horvitz, E.  (2023). Can generalist foundation models outcompete special-purpose tuning? Case study in medicine. arXiv preprint arXiv:2311.16452. https://doi.org/10.48550/arXiv.2311.16452
  41. O’Brien, R. M. (2007). A caution regarding rules of thumb for variance inflation factors. Springer, 41, 673–690. https://doi.org/10.1007/s11135-006-9018-6
  42. OECD (2021), AI and the Future of Skills, Volume 1: Capabilities and Assessments, Educational Research and Innovation, OECD Publishing, Paris, https://doi.org/10.1787/5ee71f34-en.
  43. Oksanen, J., Simpson, G. L., Blanchet, F. G., Kindt, R., Pierre Legendre, P., Minchin, P. R., O’Hara, R. B., Solymos, P., Stevens, M. H. H., Szoecs, E., Wagner, H., Barbour, M., Bedward, M., Bolker, B., Borcard, D., Borman, T., Carvalho, G., Chirico, M., De Caceres, M., … Weedon, J. (2026). vegan: Community ecology package (Version 2.7-3) [R package]. Comprehensive R Archive Network (CRAN). https://doi.org/10.32614/CRAN.package.vegan
  44. Pearson, K. (1901) On Lines and Planes of Closest Fit to Systems of Points in Space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2, 559-572. 
    https://doi.org/10.1080/14786440109462720
  45. Peng, R. D. (2011). Reproducible research in computational science. Science, 334(6060), 1226–1227. https://doi.org/10.1126/science.1213847
  46. Powers, S. M., & Hampton, S. E. (2019). Open science, reproducibility, and transparency in ecology. Ecological Applications, 29(1), e01822. https://doi.org/10.1002/eap.1822
  47. R Core Team. (2026). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/
  48. Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., & Dormann, N. (2021). Stable-Baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268), 1–8.. https://jmlr.org/papers/v22/20-1364.html
  49. Schulhoff, S., Ilie, M., Balepur, N., Kahadze, K., Liu, A., Si, C., Li, Y., Gupta, A., Han, H., Schulhoff, S., Dulepet, P. S., Vidyadhara, S., Ki, D., Agrawal, S., Pham, C., Kroiz, G., Li, F., Tao, H., Srivastava, A., Da Costa, H., Gupta, S., Rogers, M. L., Goncearenco, I., Sarli, G., Galynker, I., Peskoff, D., Carpuat, M., White, J., Anadkat, S., Hoyle, A., & Resnik, P. (2025). The prompt report: A systematic survey of prompt engineering techniques (Version 6). arXiv. https://doi.org/10.48550/arXiv.2406.06608
  50. Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359–1366. https://doi.org/10.1177/0956797611417632
  51. Singhal, K., Azizi, S., Tu, T., Mahdavi, S. S., Wei, J., Chung, H. W., Scales, N., Tanwani, A., Cole-Lewis, H., Pfohl, S., Payne, P., Seneviratne, M., Gamble, P., Kelly, C., Babiker, A., Schärli, N., Chowdhery, A., Mansfield, P., Demner-Fushman, D., Agüera y Arcas, B., Webster, D., Corrado, G. S., Matias, Y., Chou, K., Gottweis, J., Tomasev, N., Liu, Y., Rajkomar, A., Barral, J. K., Semturs, C., Karthikesalingam, A., & Natarajan, V. (2023). Large language models encode clinical knowledge. Nature, 620(7972), 1–9. https://doi.org/10.1038/s41586-023-06291-2
  52. Spearman, C. (1904) The Proof and Measurement of Association between Two Things. The American Journal of Psychology, 15, 72-101. https://doi.org/10.2307/1412159
  53. ter Braak, C. J. F., & Šmilauer, P. (2002). CANOCO reference manual and CanoDraw for Windows user's guide: Software for canonical community ordination (Version 4.5). Microcomputer Power. https://edepot.wur.nl/405659
  54. Thirunavukarasu, A. J., Ting, D. S. J., Elangovan, K., Gutierrez, L., Tan, T. F., & Ting, D. S. W. (2023). Large language models in medicine. Nature Medicine, 29, 1930–1940. https://doi.org/10.1038/s41591-023-02448-8
  55. Thomas, D. W., Kenfack, D., Chuyong, G. B., Moses, S. N., Losos, E. C., Condit, R. S., & Songwe, N. C. (2003). Tree species of the South Western Cameroon: Tree Distribution Maps, Diameter Tables, and Species documentation of the 50-hectare Korup Forest Dynamics plot. Center for Tropical Forest Science, Washington DC. 247pp.
  56. Touchon, J. C., & McCoy, M. W. (2016). The mismatch between current statistical practice and doctoral training in ecology. Ecosphere, 7(8), e01394. https://doi.org/10.1002/ecs2.1394
  57. Tukey, J. W. (1977). Exploratory data analysis (Vol. 2). Addison-Wesley. 688pp
  58. Tukey, J. W. (1980). We need both exploratory and confirmatory. The American Statistician, 34(1), 23–25. https://doi.org/10.2307/2682991
  59. Unwin, A. (2015). Graphical Data Analysis with R (1st ed.). Chapman and Hall/CRC.  310pp. https://doi.org/10.1201/9781315370088
  60. Vaithilingam, P., Zhang, T., & Glassman, E. L. (2022). Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In CHI Conference on Human Factors in Computing Systems Extended Abstracts (pp. 1-7). https://doi.org/10.1145/3491101.3519665
  61. Ver Hoef, J. M., & Boveng, P. L. (2007). Quasi-Poisson vs. negative binomial regression: How should we model overdispersed count data? Ecology, 88(11), 2766–2772. https://doi.org/10.1890/07-0043.1
  62. Wamba, S. F., & Queiroz, M. M. (2023). Responsible Artificial Intelligence as a secret ingredient for digital health: Bibliometric analysis, insights, and research directions. Springer Nature, 25, 2123–2138. https://doi.org/10.1007/s10796-021-10142-8
  63. Warton, D. I., & Hui, F. K. (2011). The arcsine is asinine: The analysis of proportions in ecology. Ecology, 92(1), 3-10. https://doi.org/10.1890/10-0340.1
  64. Weng, L., Liu, J., & Le, Q. V. (2023). Large language models as tool makers. arXiv preprint arXiv:2305.17126. https://doi.org/10.48550/arXiv.2305.17126
  65. White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., & Schmidt, D. C. (2023). A prompt pattern catalog to enhance prompt engineering with ChatGPT. arXiv:2302.11382. https://doi.org/10.48550/arXiv.2302.11382
  66. Wickham, H., Averick, M., Bryan, J., Chang, W., D’Agostino McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., Takahashi, K., Vaughan, D., Wilke, C., Woo, K., & Yutani, H. (2019). Welcome to the tidyverseJournal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686
  67. Wickham, H. (2016). ggplot2: Elegant graphics for data analysis (2nd ed.). Springer. 260pp. https://doi.org/10.1007/978-3-319-24277-4
  68. Wilson, G., Bryan, J., Cranston, K., Kitzes, J., Nederbragt, L., & Teal, T. K. (2017). Good enough practices in scientific computing. PLOS Computational Biology, 13(6), e1005510. https://doi.org/10.1371/journal.pcbi.1005510
  69. Zamfirescu-Pereira, J. D., Wong, R. Y., Hartmann, B., & Yang, Q. (2023). Why Johnny can’t prompt: How non-AI experts try (and fail) to design LLM prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23) (Article 437, pp. 1–21). Association for Computing Machinery. https://doi.org/10.1145/3544548.3581388
  70. Zeileis, A., Kleiber, C., & Jackman, S. (2008). Regression models for count data in R. Journal of Statistical Software, 27(8), 1–25. https://doi.org/10.18637/jss.v027.i08
  71. Zuur, A. F., Ieno, E. N., & Smith, G. M. (2007). Analysing ecological data. Springer. https://doi.org/10.1007/978-0-387-45972-1

Artificial Intelligence (AI) can be used to speed up data analysis with R statistics by generating R code which is executed in R (vibe data analysis), reducing the time a manual data analyst takes to develop R code. A reproducible, AIassisted framework for bivariate and multivariate statistical analysis of forestry count data was developed and validated in this study, integrating vibe data analysis with conventional manual methods using four disturbance observations (snapping, windthrow, branch fall, and dead standing) across 73 species drawn from 183 treefall gaps in Korup National Park, Cameroon. Using Claude.ai to generate R statistical code through structured prompt engineering, we systematically applied classical parametric approaches alongside non-parametric alternatives across five analytical stages: exploratory data analysis, bivariate correlation and regression, multivariate correlation matrix analysis, dimensionality reduction and clustering, and multiple linear regression. All disturbance count variables exhibited extreme positive skewness (1.776- 8.367) and severe excess kurtosis (5.554-71.014), fundamentally violating parametric assumptions and designating nonparametric methods as co-primary analytical tools. The bivariate analysis revealed a strong positive association between snapping and gap size (Pearson r = 0.865, p < 0.001; R² = 0.7483), corroborated by non-parametric methods (Spearman ρ = 0.455, p < 0.001; Kendall τ = 0.366, p < 0.001), indicating that species associated with larger canopy openings tend to record higher snapping frequencies.

Keywords : Bivariate Analysis, Multivariate Analysis, Correlation, Regression, PCA, Cluster Analysis, K-means, Vibe Data Analysis, R Statistics, Artificial Intelligence, Ecological Disturbance Data.

Paper Submission Last Date
31 - May - 2026

SUBMIT YOUR PAPER CALL FOR PAPERS
Video Explanation for Published paper

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe