Securing machine learning understanding adversarial attacks and bias mitigation| International Journal of Innovative Science and Research Technology

Securing Machine Learning: Understanding Adversarial Attacks and Bias Mitigation

Authors : Archit Lakhani; Neyah Rohit

Volume/Issue : Volume 9 - 2024, Issue 6 - June

Google Scholar : https://tinyurl.com/2a6c9xjf

Scribd : https://tinyurl.com/ct2kyxwr

DOI : https://doi.org/10.38124/ijisrt/IJISRT24JUN1671

PlumX Metrics

Semantic Scholar

ResearchGate

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.

Abstract : This paper offers a comprehensive examination of adversarial vulnerabilities in machine learning (ML) models and strategies for mitigating fairness and bias issues. It analyses various adversarial attack vectors encompassing evasion, poisoning, model inversion, exploratory probes, and model stealing, elucidating their potential to compromise model integrity and induce misclassification or information leakage. In response, a range of defence mechanisms including adversarial training, certified defences, feature transformations, and ensemble methods are scrutinized, assessing their effectiveness and limitations in fortifying ML models against adversarial threats. Furthermore, the study explores the nuanced landscape of fairness and bias in ML, addressing societal biases, stereotypes reinforcement, and unfair treatment, proposing mitigation strategies like fairness metrics, bias auditing, de-biasing techniques, and human-in-the-loop approaches to foster fairness, transparency, and ethical AI deployment. This synthesis advocates for interdisciplinary collaboration to build resilient, fair, and trustworthy AI systems amidst the evolving technological paradigm.

Keywords : Machine Learning, Artificial Intelligence, Evasion Attacks, Poisoning Attacks, Inversion and Extraction, Exploratory Attacks, Model Stealing, Transferability Attacks, Adversarial Patch Attacks, Model Confidence Attacks, Adversarial Reinforcement Learning, Adversarial Example in NLPs, Black Box Attacks, Carlini Wager Attack, Bayesian Neural Networks, Neural Networks, Robust, Attack and Defence Models. Bias, Bias Mitigation, Ethical Concerns, Training Data, Inversion, Perturbations, Security, Network, Model.

References :

"Adversarial Attacks and Perturbations." Nightfall AI, www.nightfall.ai/ai-security-101/adversarial-attacks-and-perturbations #:~:text=attacks%20and%20perturbations%3F-,Adversarial%20attack s%20and%20perturbations%20are%20techniques%20used%20to%20 exploit%20vulnerabilities,making%20incorrect%20predictions%20or %20decisions. Accessed 4 Jan. 2024.
"Adversarial Attacks on Neural Networks: Exploring the Fast Gradient Sign Method." neptune.ai, 24 Aug. 2023, neptune.ai/blog/adversarial-attacks-on-neural-networks-exploring-the fast-gradient-sign-method#:~:text=The%20Fast%20Gradient%20Sign %20Method%20%28FGSM%29%20combines%20a,a%20neural%20 network%20model%20into%20making%20wrong%20predictions. Accessed 4 Jan. 2024.
"Know Your Enemy: How You Can Create and Defend against Adversarial Attacks." Medium, 6 Jan. 2019, towardsdatascience.com/know-your-enemy-7f7c5038bdf3. Accessed 4 Jan. 2024.
"Data Poisoning: How Machine Learning Gets Corrupted." Roboticsbiz, 11 May 2022, roboticsbiz.com/data-poisoning-how-machine-learning-gets-corrupted /. Accessed 4 Jan. 2024.
Zhuo Lv, Hongbo Cao, Feng Zhang, Yuange Ren, Bin Wang, Cen Chen, Nuannuan Li, Hao Chang, Wei Wang, AWFC: Preventing Label Flipping Attacks Towards Federated Learning for Intelligent IoT, The Computer Journal, Volume 65, Issue 11, November 2022, Pages 2849–2859, https://doi.org/10.1093/comjnl/bxac124
Salem, A. et al. “Dynamic Backdoor Attacks Against Machine Learning Models.” 2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P) (2020): 703-718.
Soremekun, E., Udeshi, S., & Chattopadhyay, S. (2020). Towards Backdoor Attacks and Defense in Robust Machine Learning Models. ArXiv. /abs/2003.00865
"Exploratory Attacks." StackExchange, ai.stackexchange.com/questions/16502/what-are-causative-and-explor atory-attacks-in-adversarial-machine-learning#:~:text=An%20explora tory%20attack%20is%20sending%20tons%20of%20inquiries,they%2 0could%20try%20to%20reconstruct%20the%20data%20set. Accessed 5 Jan. 2024.
Joseph AD, Nelson B, Rubinstein BIP, Tygar JD. Exploratory Attacks on Machine Learning. In: Adversarial Machine Learning. Cambridge University Press; 2019:165-166.
"The Threat of Query Attacks on Machine Learning Models." Defence.Ai, 19 Jul. 2022, defence.ai/ai-security/query-attacks-ml/#what-are-query-attacks. Accessed 6 Jan. 2024.
"Output Privacy and Federated Machine Learning." ScaleOut, 26 Jun. 2023, www.scaleoutsystems.com/post/output-privacy-and-federated-machin e-learning#:~:text=Model%20Reverse%20Engineering&text=Model %20inversion%20attacks%20aim%20to,3%2C%204%2C%205%5D. Accessed 6 Jan. 2024.
"ML03:2023 Model Inversion Attack Description." OWASP, owasp.org/www-project-machine-learning-security-top-10/docs/ML0 3_2023-Model_Inversion_Attack. Accessed 6 Jan. 2024.
Tramer, F., Carlini, N., Brendel, W., & Madry, A. (2020). On Adaptive Attacks to Adversarial Example Defenses. ArXiv. /abs/2002.08347
Demontis, A., Melis, M., Pintor, M., Jagielski, M., Biggio, B., Oprea, A., & Roli, F. (2018). Why Do Adversarial Attacks Transfer? Explaining Transferability of Evasion and Poisoning Attacks. ArXiv. /abs/1809.02861
E. Nowroozi, Y. Mekdad, M. H. Berenjestanaki, M. Conti and A. E. Fergougui, "Demystifying the Transferability of Adversarial Attacks in Computer Networks," in IEEE Transactions on Network and Service Management, vol. 19, no. 3, pp. 3387-3400, Sept. 2022, doi: 10.1109/TNSM.2022.3164354
"Learning Machine Learning Part 3: Attacking Black Box Models." Medium, 4 May 2022, posts.specterops.io/learning-machine-learning-part-3-attacking-black box-models-3efffc256909. Accessed 7 Jan. 2024.
Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z. B., & Swami, A. (2016). Practical Black-Box Attacks against Machine Learning. ArXiv. /abs/1602.02697
N. Carlini, S. Chien, M. Nasr, S. Song, A. Terzis and F. Tramèr, "Membership Inference Attacks From First Principles," 2022 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 2022, pp. 1897-1914, doi: 10.1109/SP46214.2022.9833649.
Kariyappa, S., & Qureshi, M. K. (2019). Defending Against Model Stealing Attacks with Adaptive Misinformation. ArXiv. /abs/1911.07100
Du, A., Chen, B., Chin, T., Law, Y. W., Sasdelli, M., Rajasegaran, R., & Campbell, D. (2021). Physical Adversarial Attacks on an Aerial Imagery Object Detector. ArXiv. /abs/2108.11765
Fawzi, A., Fawzi, O., & Frossard, P. (2016). Universal adversarial perturbations. ArXiv. /abs/1610.08401
"What Are Adversarial Examples in NLP?" Medium, 28 Aug. 2020, towardsdatascience.com/what-are-adversarial-examples-in-nlp-f928c5 74478e. Accessed 14 Jan. 2024.
Cavallo, Elisabetta, and Ragavan, Seyoon. "Adversarial Examples in NLP." COS598C - Deep Learning for Natural Language Processing, Princeton University, 16 April 2020, www.cs.princeton.edu/courses/archive/spring20/cos598C/lectures/lec 19-adversarial-examples.pdf
"The Ultimate Guide to Word Embeddings." neptune.ai, 18 Aug. 2023, neptune.ai/blog/word-embeddings-guide. Accessed 14 Jan. 2024.
"A Guide to Word Embedding." Medium, 26 Oct. 2020, towardsdatascience.com/a-guide-to-word-embeddings-8a23817ab60f. Accessed 14 Jan. 2024.
Liu, Huijun, et al. "Textual Adversarial Attacks by Exchanging Text-Self Words." International Journal of Intelligent Systems, vol. 37, no. 12, December 2022, pp. 12212–12234. https://doi.org/10.1002/int.23083
Yuan Zang, Fanchao Qi, Chenghao Yang, Zhiyuan Liu, Meng Zhang, Qun Liu, and Maosong Sun. 2020. Word-level Textual Adversarial Attacking as Combinatorial Optimization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6066–6080, Online. Association for Computational Linguistics.
Gross, D., Simao, T. D., Jansen, N., & Perez, G. A. (2022).Targeted Adversarial Attacks on Deep Reinforcement Learning Policies via Model Checking. ArXiv. /abs/2212.05337
Obadinma, S., Zhu, X., & Guo, H. (2024). Calibration Attack: A Framework For Adversarial Attacks Targeting Calibration. ArXiv. /abs/2401.02718
Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. 2015. Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS '15). Association for Computing Machinery, New York, NY, USA, 1322–1333. https://doi.org/10.1145/2810103.2813677
Zhang, Haotian, and Ma, Xu. "Misleading attention and classification: An adversarial attack to fool object detection models in the real world." Computers & Security, vol. 122, 2022, article number 102876. ISSN 0167-4048. DOI: 10.1016/j.cose.2022.102876. Accessed [date], https://www.sciencedirect.com/science/article/pii/S016740482200270X
Behzadan, V., & Munir, A. (2018). Mitigation of Policy Manipulation Attacks on Deep Q-Networks with Parameter-Space Noise. ArXiv. /abs/1806.02190
Guo, Sensen, et al. "A Black-Box Attack Method against Machine-Learning-Based Anomaly Network Flow Detection Models." Security and Communication Networks, vol. 2021, 2021, pp. 1-13. DOI: https://doi.org/10.1155/2021/5578335
"Adversarial Attacks with Carlini & Wagner Approach." Medium, 29 Dec. 2023, medium.com/@zachariaharungeorge/adversarial-attacks-with-carlini wagner-approach-8307daa9a503. Accessed 17 Jan. 2024.
Dai, S., Mahloujifar, S., & Mittal, P. (2021). Parameterizing Activation Functions for Adversarial Robustness. ArXiv. /abs/2110.05626
Jospin, L. V., Buntine, W., Boussaid, F., Laga, H., & Bennamoun, M. (2020). Hands-on Bayesian Neural Networks -- a Tutorial for Deep Learning Users. ArXiv. https://doi.org/10.1109/MCI.2022.3155327
"Why You Should Use Bayesian Neural Network." Medium, 17 Oct. 2021, towardsdatascience.com/why-you-should-use-bayesian-neural-networ k-aaf76732c150#:~:text=What%20is%20Bayesian%20Neural%20Net work,that%20best%20fit%20the%20data. Accessed 29 Jan. 2024.
"What Are Bayesian Neural Networks?" Databricks, www.databricks.com/glossary/bayesian-neural-network. Accessed 29 Jan. 2024.
"Bayesian Neural Networks—Implementing, Training, Inference With the JAX Framework." Neptune.Ai, 9 Aug. 2023, neptune.ai/blog/bayesian-neural-networks-with-jax. Accessed 29 Jan. 2024.
Madden, Samuel. “AutoFE : Efficient and Robust Automated Feature Engineering by Hyunjoon Song.” (2018).
"The Art of Feature Engineering: Unraveling the Essence of Data." Medium, 20 Jul. 2023, medium.com/@evertongomede/the-art-of-feature-engineering-unravel ing-the-essence-of-data-9cba7b61502f. Accessed 5 Feb. 2024.
"Only As Strong As Your Data: Using Feature Engineering to Build Robust AI." Liquid Analytics, 24 Jul. 2018, www.liquidanalytics.com/blog/2018/7/24/only-as-strong-as-your-data -using-feature-engineering-to-build-robust-ai. Accessed 5 Feb. 2024.
"Best Practices and Missteps in Feature Engineering for Machine Learning." Quanthub, 5 Oct. 2023, www.quanthub.com/best-practices-and-missteps-in-feature-engineerin g-for-machine-learning/. Accessed 5 Feb. 2024.
Cynthia Dwork and Aaron Roth (2014), "The Algorithmic Foundations of Differential Privacy", Foundations and Trends® in Theoretical Computer Science: Vol. 9: No. 3–4, pp 211-407. http://dx.doi.org/10.1561/0400000042.
"What Is Differential Privacy: Definition, Mechanisms, and Examples." Statice by ANONOS, 21 Dec. 2022, www.statice.ai/post/what-is-differential-privacy-definition-mechanis ms-examples#:~:text=Differential%20privacy%20adds%20noise%20t o,to%20balance%20privacy%20and%20utility. Accessed 5 Feb. 2024.
"Understanding Differential Privacy." Medium, 1 Jul. 2019,towardsdatascience.com/understanding-differential-privacy-85c e191e198a. Accessed 5 Feb. 2024.
"Throttling and Rate Limiting in System Design." Enjoyalgorithms, www.enjoyalgorithms.com/blog/throttling-and-rate-limiting. Accessed 7 Feb. 2024.
"Rate Limiting: Unveiling the Crucial Differences." Linkedin, 23 Aug. 2023, www.linkedin.com/pulse/decoding-api-throttling-rate-limiting-unveili ng-crucial-differences/. Accessed 7 Feb. 2024.
"API Rate Limiting Vs. API Throttling: How Are They Different?" NORDIC APIS, 8 Mar. 2023, nordicapis.com/api-rate-limiting-vs-api-throttling-how-are-they-differ ent/. Accessed 7 Feb. 2024.
Waqas, A., Farooq, H., Bouaynaya, N. C., & Rasool, G. (2022). Exploring robust architectures for deep artificial neural networks. Communications Engineering, 1(1), 1-12. https://doi.org/10.1038/s44172-022-00043-2
Tan, X., Gao, J., & Li, R. (2022). A Simple Structure For Building A Robust Model. ArXiv. /abs/2204.11596
Sharif, A., & Marijan, D. (2021). Adversarial Deep Reinforcement Learning for Improving the Robustness of Multi-agent Autonomous Driving Policies. ArXiv. https://doi.org/10.1109/APSEC57359.2022.00018
Korkmaz, E. (2023). Adversarial Robust Deep Reinforcement Learning Requires Redefining Robustness. ArXiv. /abs/2301.07487
Pinto, L., Davidson, J., Sukthankar, R., & Gupta, A. (2017). Robust Adversarial Reinforcement Learning. ArXiv. /abs/1703.02702
"Using Distillation to Protect Your Neural Networks." Medium, 1 Jul. 2021, towardsdatascience.com/using-distillation-to-protect-your-neural-net works-ea7f0bf3aec4. Accessed 8 Feb. 2024.
"Defensive Distillation." Activeloop, towardsdatascience.com/using-distillation-to-protect-your-neural-net works-ea7f0bf3aec4. Accessed 15 Feb. 2024.
"What Is Defensive Distillation?" Deepai, deepai.org/machine-learning-glossary-and-terms/defensive-distillation . Accessed 25 Feb. 2024.
Papernot, N., McDaniel, P., Wu, X., Jha, S., & Swami, A. (2015). Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks. ArXiv. /abs/1511.04508.
"Fairness in AI: The Challenges of Dealing with Bias in Machine Learning." Medium, 8 Aug. 2023, medium.com/bluetuple-ai/fairness-in-ai-the-challenges-of-dealing-wit h-bias-in-machine-learning-ac82b4bd40d9. Accessed 13 Mar. 2024.
"Unfair Predictions: 5 Common Sources of Bias in Machine Learning." Medium, 14 Apr. 2022, towardsdatascience.com/algorithm-fairness-sources-of-bias-7082e5b7 8a2c. Accessed 13 Mar. 2024.
"Understanding Bias and Fairness in AI Systems." Medium, 25 Mar. 2021, towardsdatascience.com/understanding-bias-and-fairness-in-ai-system s-6f7fbfe267f3#:~:text=Historical%20bias%20is%20the%20already,b een%20historically%20disadvantaged%20or%20excluded. Accessed 13 Mar. 2024.
Hussain, Muhammad Zunnurain. (2023). Data security and Integrity in Cloud Computing. 10.1109/ICONAT57137.2023.10080440.
Ilahi et al., "Challenges and Countermeasures for Adversarial Attacks on Deep Reinforcement Learning," in IEEE Transactions on Artificial Intelligence, vol. 3, no. 2, pp. 90-109, April 2022, doi: 10.1109/TAI.2021.3111139.
Xu, M., Liu, Z., Huang, P., Ding, W., Cen, Z., Li, B., & Zhao, D. (2022). Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities: Robustness, Safety, and Generalizability. ArXiv. /abs/2209.08025
Varona, D., & Suárez, J. L. (2021). Discrimination, Bias, Fairness, and Trustworthy AI. Applied Sciences, 12(12), 5826. https://doi.org/10.3390/app12125826
Ferrara, E. (2024). Fairness and Bias in Artificial Intelligence: A Brief Survey of Sources, Impacts, and Mitigation Strategies. Sci, 6(1), 3. https://doi.org/10.3390/sci6010003
"Root Out Bias at Every Stage of Your AI-Development Process." Harvard Business Review, 30 Oct. 2020, hbr.org/2020/10/root-out-bias-at-every-stage-of-your-ai-development process. Accessed 4 Apr. 2024.
"Reducing AI Bias with Rejection Option-based Classification." Medium, 13 May 2020, towardsdatascience.com/reducing-ai-bias-with-rejection-option-based -classification-54fefdb53c2e. Accessed 4 Apr. 2024.
EU Commission. "Ethics Guidelines for Trustworthy AI: Context and Implementation." European Commission, 2019, https://ec.europa.eu/digital-single-market/en/news/ethics-guidelines-tr ustworthy-ai.
"Machine Learning Bias." Deepcheck, deepchecks.com/glossary/machine-learning-bias/#:~:text=Bias%20in %20ML%20is%20an,a%20model's%20use%20case%20accurately. Accessed 4 Apr. 2024.

This paper offers a comprehensive examination of adversarial vulnerabilities in machine learning (ML) models and strategies for mitigating fairness and bias issues. It analyses various adversarial attack vectors encompassing evasion, poisoning, model inversion, exploratory probes, and model stealing, elucidating their potential to compromise model integrity and induce misclassification or information leakage. In response, a range of defence mechanisms including adversarial training, certified defences, feature transformations, and ensemble methods are scrutinized, assessing their effectiveness and limitations in fortifying ML models against adversarial threats. Furthermore, the study explores the nuanced landscape of fairness and bias in ML, addressing societal biases, stereotypes reinforcement, and unfair treatment, proposing mitigation strategies like fairness metrics, bias auditing, de-biasing techniques, and human-in-the-loop approaches to foster fairness, transparency, and ethical AI deployment. This synthesis advocates for interdisciplinary collaboration to build resilient, fair, and trustworthy AI systems amidst the evolving technological paradigm.