Authors :
Niraj Patel
Volume/Issue :
Volume 10 - 2025, Issue 2 - February
Google Scholar :
https://tinyurl.com/fsftxcvr
Scribd :
https://tinyurl.com/2szuhyjk
DOI :
https://doi.org/10.5281/zenodo.14964344
Abstract :
Residual analysis plays a pivotal role in validating regression models by identifying potential issues such as
heteroscedasticity, non-linearity, and model misspecification. This study introduces a novel automated framework for
residual diagnostics, integrating computer vision techniques with statistical inference. The proposed system evaluates residual
plots, detects irregularities, and performs hypothesis testing to ensure model robustness. By combining image recog- nition
algorithms with a user-friendly Shiny application, the approach eliminates subjective biases inherent in manual plot
evaluation. The resulting tool enhances the scalability and reliability of regression diagnostics, offering data scientists a
powerful resource forbuilding accurate andinterpretable models.
References :
- A. Zeileis and T. Hothorn, “Diagnostic checking in regression relationships,” R News, vol. 2, no. 3, pp. 7–10, 2002.
- R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna,
- Austria, 2022. [Online]. Available: https://www.R-project.org/
- J. A. Long, jtools: Analysis and Presentation of Social Scientific Data, 2022, r package version 2.2.0. [Online]. Available: https://cran.r-project.org/package=jtools
- A. Hebbali, olsrr: Tools for Building OLS Regression Models, 2024, r package version 0.6.0. [Online]. Available: https://CRAN.R-project.org/package=olsrr
- P. E. Johnson, rockchalk: Regression Estimation and Presentation, 2022, r package version 1.8.157. [Online]. Available: https://CRAN.R-project.org/package=rockchalk
- K. Goode and K. Rey, ggResidpanel: Panels and Interactive Versions of Diagnostic Plots using ’ggplot2’, 2019, r package version 0.3.0. [Online]. Available: https://CRAN.R-project.org/package=ggResidpanel
- R. D. Cook and S. Weisberg, Residuals and influence in regres- sion. New York: Chapman and Hall, 1982.
- D. I. Warton, “Global simulation envelopes for diagnostic plots in regression models,” The American Statistician, vol. 77, no. 4, pp. 425–431, 2023.
- F. Hartig, DHARMa: Residual Diagnostics for Hierarchical (Multi-Level / Mixed) Regression Models, 2022, r package version 0.4.6. [Online]. Available: https://CRAN.R-project.org/package=DHARMa
- W. Li, D. Cook, E. Tanaka, and S. VanderPlas, “A plot is worth a thousand tests: Assessing residual diagnostics with the lineup protocol,” Journal of Computational and Graphical Statistics, vol. 33, pp. 1497–1511, 2024.
- A. Buja, D. Cook, H. Hofmann, M. Lawrence, E.-K. Lee, D. F. Swayne, and H. Wickham, “Statistical inference for exploratory data analysis and model diagnostics,” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 367, no. 1906, pp. 4361–4383, 2009.
- H. Wickham, N. R. Chowdhury, D. Cook, and H. Hofmann, nullabor: Tools for Graphical Inference, 2020, r package version 0.3.9. [Online]. Available: https://CRAN.R-project.org/package=nullabor
- A. Loy and H. Hofmann, “Hlmdiag: A suite of diagnostics for hierarchical linear models in r,” Journal of Statistical Software, vol. 56, pp. 1–28, 2014.
- A. Reinhart, regressinator: Simulate and Diagnose (Generalized) Linear Models, 2024, r package version 0.2.0. [Online]. Available: https://CRAN.R-project.org/package=regressinator
- W. Li, D. Cook, E. Tanaka, S. VanderPlas, and K. Ackermann, “Automated assessment of residual plots with computer vision models,” arXiv preprint arXiv:2411.01001, 2024.
- T. S. Breusch and A. R. Pagan, “A simple test for heteroscedas- ticity and random coefficient variation,” Econometrica: Journal of the Econometric Society, pp. 1287–1294, 1979.
- J. B. Ramsey, “Tests for specification errors in classical linear least-squares regression analysis,” Journal of the Royal Statistical Society: Series B (Methodological), vol. 31, no. 2, pp. 350–371, 1969.
- J. J. Balamuta, surreal: Create Datasets with Hidden Images in Residual Plots, 2024, r package version 0.0.1. [Online]. Available: https://CRAN.R-project.org/package=surreal
- S. S. Shapiro and M. B. Wilk, “An analysis of variance test for normality (complete samples),” Biometrika, vol. 52, no. 3/4, pp. 591–611, 1965.
- W. Li, “bandicoot: Light-weight python-like object-oriented system,” 2024. [Online]. Available: https://CRAN.R-project.org/package=bandicoot A. Clark et al., “Pillow (pil fork) documentation,” readthedocs, 2015.
- M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin et al., “Tensor- flow: Large-scale machine learning on heterogeneous distributed systems,” arXiv preprint arXiv:1603.04467, 2016.
- H. Wickham, ggplot2: Elegant graphics for data analysis. Springer-Verlag New York, 2016. [Online]. Available: https://ggplot2.tidyverse.org
- H. Mason, S. Lee, U. Laa, and D. Cook, cassowaryr: Compute Scagnostics on Pairs of Numeric Variables in a Data Set, 2022, r package version 2.0.0. [Online]. Available: https://CRAN.R-project.org/package=cassowary
- K. Ushey, J. Allaire, and Y. Tang, reticulate: Interface to ’Python’, 2024, r package version 1.35.0. [Online]. Available: https://CRAN.R-project.org/package=reticulate
- W. Chang, J. Cheng, J. Allaire, C. Sievert, B. Schloerke, Y. Xie, J. Allen, J. McPherson, A. Dipert, and B. Borges, shiny: Web Application Framework for R, 2022, r package version 1.7.3. [Online]. Available: https://CRAN.R-project.org/package=shi ny
- W. Chang and B. Borges Ribeiro, shinydashboard: Create Dashboards with ’Shiny’, 2021, r package version 0.7.2. [Online]. Available: https://CRAN.R-project.org/package=shinydashbo ard
- J. Cheng, C. Sievert, B. Schloerke, W. Chang, Y. Xie, and J. Allen, htmltools: Tools for HTML, 2024, r package version 0.5.8. [Online]. Available: https://CRAN.R-project.org/package=htmltools
- A. Sali and D. Attali, shinycssloaders: Add Loading Animations to a ’shiny’ Output While It’s Recalculating, 2020, r package version 1.0.0. [Online]. Available: https://CRAN.R-project.org/package=shinycssloaders
- K.-W. Moon, webr: Data and Functions for Web-Based Analysis, 2020, r package version 0.1.5. [Online]. Available: https://CRAN.R-project.org/package=webr
- B. Rowlingson and P. Diggle, splancs: Spatial and Space-Time Point Pattern Analysis, 2023, r package version 2.01-44. [Online]. Available: https://CRAN.R-project.org/package=splancs
- A. Zakai, “Emscripten: an llvm-to-javascript compiler,” in Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion, 2011, pp. 301–312.
- L. Gautier, Python interface to the R language (embedded R), 2024, version 3.5.16. [Online]. Available: https://pypi.org/proje ct/rpy2/
- D. Attali, shinyjs: Easily Improve the User Experience of Your Shiny Apps in Seconds, 2021, r package version 2.1.0. [Online]. Available: https://CRAN.R-project.org/package=shinyjs
- R. Davies, S. Locke, and L. D’Agostino McGowan, datasauRus: Datasets from the Datasaurus Dozen, 2022, re package version 0.1.6. [Online]. Available: https://CRAN.R-project.org/package=datasauRus
- J. Allaire, C. Teague, C. Scheidegger, Y. Xie, and C. Dervieux, “Quarto,” Feb. 2024. [Online]. Available: https://github.com/q uarto-dev/quarto-cli
- J. MacFarlane, A. Krewinkel, and J. Rosenthal, “Pandoc,” 2024. [Online]. Available: https://github.com/jgm/pandoc
- H. Wickham, M. Averick, J. Bryan, W. Chang, L. D. McGowan,R. François, G. Grolemund, A. Hayes, L. Henry, J. Hester, M. Kuhn, T. L. Pedersen, E. Miller, S. M. Bache, K. Müller,J. Ooms, D. Robinson, D. P. Seidel, V. Spinu, K. Takahashi, D. Vaughan, C. Wilke, K. Woo, and H. Yutani, “Welcome to the tidyverse,” Journal of Open Source Software, vol. 4, no. 43, p. 1686, 2019.
- H. Zhu, kableExtra: Construct complex table with kable and pipe syntax, 2021, r package version 1.3.4. [Online]. Available: https://CRAN.R-project.org/package=kableExtra
- T. L. Pedersen, patchwork: The composer of plots, 2022, rpackage version 1.1.2. [Online]. Available: https://CRAN.R-pro ject.org/package=patchwork
- J. Nowosad, ’CARTOColors’ palettes, 2018, r package version 1.0. [Online]. Available: https://nowosad.github.io/rcartocolor
- J. Hester and J. Bryan, glue: Interpreted String Literals, 2022, r package version 1.6.2. [Online]. Available: https://CRAN.R-project.org/package=glue
- K. Müller, here: A simpler way to find your files, 2020, r package version 1.0.1. [Online]. Available: https://CRAN.R-project.org/package=here
- J. Ooms, magick: Advanced Graphics and Image-Processing in R, 2023, r package version 2.7.4. [Online]. Available: https://CRAN.R-project.org/package=magick
- M. Kuhn, D. Vaughan, and E. Hvitfeldt, yardstick: Tidy Characterizations of Model Performance, 2024, r package version 1.3.1. [Online]. Available: https://CRAN.R-project.org/package=yardstick
Residual analysis plays a pivotal role in validating regression models by identifying potential issues such as
heteroscedasticity, non-linearity, and model misspecification. This study introduces a novel automated framework for
residual diagnostics, integrating computer vision techniques with statistical inference. The proposed system evaluates residual
plots, detects irregularities, and performs hypothesis testing to ensure model robustness. By combining image recog- nition
algorithms with a user-friendly Shiny application, the approach eliminates subjective biases inherent in manual plot
evaluation. The resulting tool enhances the scalability and reliability of regression diagnostics, offering data scientists a
powerful resource forbuilding accurate andinterpretable models.