Authors :
Kapil Kumar Goyal
Volume/Issue :
Volume 10 - 2025, Issue 5 - May
Google Scholar :
https://tinyurl.com/yc8ezetm
DOI :
https://doi.org/10.38124/ijisrt/25may1773
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Abstract :
Performance metrics of AI models like accuracy, precision, and recall are often reported in a vacuum, detached
from the real-world contexts in which the models are deployed. Yet increasingly, the criticality and sensitivity of model
applications demand a more nuanced approach to their performance evaluation. This paper introduces a new framework—
Contextual AI Evaluations—that allows teams to assess models with greater relevance to the conditions under which the
models will be deployed. Contextual AI Evaluations assign weights to different deployment scenarios to reflect the
operational risk, business impact, and user sensitivity associated with each scenario. The framework is applied to several
models currently in use.
Keywords :
Model Evaluation, AI Metrics, Contextual Weighting, AI Governance, Risk-Aware AI, Responsible AI, Scenario-Based Grading.
References :
- R. Burnell, W. Schellaert, J. Burden, T. D. Ullman, and F. Martinez-Plumed, “Rethink reporting of evaluation results in AI,” Science, vol. 380, no. 6641, pp. 136–138, Apr. 2023. [Online]. Available: https://www.scien ce.org/doi/10.1126/science.adf6369
- J. Hernández-Orallo, “AI evaluation: past, present and future,” arXiv preprint arXiv:1408.6908, Aug. 2014. [Online]. Available: https://arxiv.org/abs/1408.6908
- J. Burden, “Evaluating AI Evaluation: Perils and Prospects,” arXiv preprint arXiv:2407.09221, Jul. 2024. [Online]. Available: https://arxiv.org/abs/2407.0 9221
- S. Sun et al., “A Review of Multimodal Explainable Artificial Intelligence: Past, Present and Future,” arXiv preprint arXiv:2412.14056, Dec. 2024. [Online]. Available: https://arxiv.org/abs/2412.14056
- M. Nauta et al., “From Anecdotal Evidence to Quantitative Evaluation Methods: A Systematic Review on Evaluating Explainable AI,” arXiv preprint arXiv:2201.08164, Jan. 2022. [Online]. Available: https://arxiv.org/abs/2201.08164
- V. Turri et al., “Measuring AI Systems Beyond Accuracy,” arXiv preprint arXiv:2204.04211, Apr. 2022. [Online]. Available: https://arxiv.org/abs/2204.0 4211
- S. Mohseni, N. Zarei, and E. D. Ragan, “A Multidisciplinary Survey and Framework for Design and Evaluation of Explainable AI Systems,” arXiv preprint arXiv:1811.11839, Nov. 2018. [Online]. Available: https://arxiv.org/abs/1811.11839
- R. R. Hoffman, S. T. Mueller, G. Klein, and J. Litman, “Metrics for Explainable AI: Challenges and Prospects,” arXiv preprint arXiv:1812.04608, Dec. 2018. [Online]. Available: https://arxiv.org/abs/1812.0 4608
- C. Clemm et al., “Towards Green AI: Current status and future research,” arXiv preprint arXiv:2407.10237, Jul. 2024. [Online]. Available: https://arxiv.org/abs/240 7.10237
- J. Marques-Silva, “Logic-Based Explainability: Past, Present & Future,” arXiv preprint arXiv:2406.11873, Jun. 2024. [Online]. Available: https://arxiv.org/abs/24 06.11873
Performance metrics of AI models like accuracy, precision, and recall are often reported in a vacuum, detached
from the real-world contexts in which the models are deployed. Yet increasingly, the criticality and sensitivity of model
applications demand a more nuanced approach to their performance evaluation. This paper introduces a new framework—
Contextual AI Evaluations—that allows teams to assess models with greater relevance to the conditions under which the
models will be deployed. Contextual AI Evaluations assign weights to different deployment scenarios to reflect the
operational risk, business impact, and user sensitivity associated with each scenario. The framework is applied to several
models currently in use.
Keywords :
Model Evaluation, AI Metrics, Contextual Weighting, AI Governance, Risk-Aware AI, Responsible AI, Scenario-Based Grading.