Securing Machine Learning: Understanding Adversarial Attacks and Bias Mitigation


Authors : Archit Lakhani; Neyah Rohit

Volume/Issue : Volume 9 - 2024, Issue 6 - June


Google Scholar : https://tinyurl.com/2a6c9xjf

Scribd : https://tinyurl.com/ct2kyxwr

DOI : https://doi.org/10.38124/ijisrt/IJISRT24JUN1671

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.


Abstract : This paper offers a comprehensive examination of adversarial vulnerabilities in machine learning (ML) models and strategies for mitigating fairness and bias issues. It analyses various adversarial attack vectors encompassing evasion, poisoning, model inversion, exploratory probes, and model stealing, elucidating their potential to compromise model integrity and induce misclassification or information leakage. In response, a range of defence mechanisms including adversarial training, certified defences, feature transformations, and ensemble methods are scrutinized, assessing their effectiveness and limitations in fortifying ML models against adversarial threats. Furthermore, the study explores the nuanced landscape of fairness and bias in ML, addressing societal biases, stereotypes reinforcement, and unfair treatment, proposing mitigation strategies like fairness metrics, bias auditing, de-biasing techniques, and human-in-the-loop approaches to foster fairness, transparency, and ethical AI deployment. This synthesis advocates for interdisciplinary collaboration to build resilient, fair, and trustworthy AI systems amidst the evolving technological paradigm.

Keywords : Machine Learning, Artificial Intelligence, Evasion Attacks, Poisoning Attacks, Inversion and Extraction, Exploratory Attacks, Model Stealing, Transferability Attacks, Adversarial Patch Attacks, Model Confidence Attacks, Adversarial Reinforcement Learning, Adversarial Example in NLPs, Black Box Attacks, Carlini Wager Attack, Bayesian Neural Networks, Neural Networks, Robust, Attack and Defence Models. Bias, Bias Mitigation, Ethical Concerns, Training Data, Inversion, Perturbations, Security, Network, Model.

References :

  1. "Adversarial Attacks and Perturbations." Nightfall AI, www.nightfall.ai/ai-security-101/adversarial-attacks-and-perturbations #:~:text=attacks%20and%20perturbations%3F-,Adversarial%20attack s%20and%20perturbations%20are%20techniques%20used%20to%20 exploit%20vulnerabilities,making%20incorrect%20predictions%20or %20decisions. Accessed 4 Jan. 2024.
  2. "Adversarial Attacks on Neural Networks: Exploring the Fast Gradient Sign Method." neptune.ai, 24 Aug. 2023, neptune.ai/blog/adversarial-attacks-on-neural-networks-exploring-thefast-gradient-sign-method#:~:text=The%20Fast%20Gradient%20Sign %20Method%20%28FGSM%29%20combines%20a,a%20neural%20 network%20model%20into%20making%20wrong%20predictions. Accessed 4 Jan. 2024.
  3. "Know Your Enemy: How You Can Create and Defend against Adversarial Attacks." Medium, 6 Jan. 2019, towardsdatascience.com/know-your-enemy-7f7c5038bdf3. Accessed 4 Jan. 2024.
  4. "Data Poisoning: How Machine Learning Gets Corrupted." Roboticsbiz, 11 May 2022, roboticsbiz.com/data-poisoning-how-machine-learning-gets-corrupted /. Accessed 4 Jan. 2024.
  5. Zhuo Lv, Hongbo Cao, Feng Zhang, Yuange Ren, Bin Wang, Cen Chen, Nuannuan Li, Hao Chang, Wei Wang, AWFC: Preventing Label Flipping Attacks Towards Federated Learning for Intelligent IoT, The Computer Journal, Volume 65, Issue 11, November 2022, Pages 2849–2859, https://doi.org/10.1093/comjnl/bxac124
  6. Salem, A. et al. “Dynamic Backdoor Attacks Against Machine Learning Models.” 2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P) (2020): 703-718.
  7. Soremekun, E., Udeshi, S., & Chattopadhyay, S. (2020). Towards Backdoor Attacks and Defense in Robust Machine Learning Models. ArXiv. /abs/2003.00865
  8. "Exploratory Attacks." StackExchange, ai.stackexchange.com/questions/16502/what-are-causative-and-explor atory-attacks-in-adversarial-machine-learning#:~:text=An%20explora tory%20attack%20is%20sending%20tons%20of%20inquiries,they%2 0could%20try%20to%20reconstruct%20the%20data%20set. Accessed 5 Jan. 2024.
  9. Joseph AD, Nelson B, Rubinstein BIP, Tygar JD. Exploratory Attacks on Machine Learning. In: Adversarial Machine Learning. Cambridge University Press; 2019:165-166.
  10. "The Threat of Query Attacks on Machine Learning Models." Defence.Ai, 19 Jul. 2022, defence.ai/ai-security/query-attacks-ml/#what-are-query-attacks. Accessed 6 Jan. 2024.
  11. "Output Privacy and Federated Machine Learning." ScaleOut, 26 Jun. 2023, www.scaleoutsystems.com/post/output-privacy-and-federated-machin e-learning#:~:text=Model%20Reverse%20Engineering&text=Model %20inversion%20attacks%20aim%20to,3%2C%204%2C%205%5D. Accessed 6 Jan. 2024.
  12. "ML03:2023 Model Inversion Attack Description." OWASP, owasp.org/www-project-machine-learning-security-top-10/docs/ML0 3_2023-Model_Inversion_Attack. Accessed 6 Jan. 2024.
  13. Tramer, F., Carlini, N., Brendel, W., & Madry, A. (2020). On Adaptive Attacks to Adversarial Example Defenses. ArXiv. /abs/2002.08347
  14. Demontis, A., Melis, M., Pintor, M., Jagielski, M., Biggio, B., Oprea, A., & Roli, F. (2018). Why Do Adversarial Attacks Transfer? Explaining Transferability of Evasion and Poisoning Attacks. ArXiv. /abs/1809.02861
  15. E. Nowroozi, Y. Mekdad, M. H. Berenjestanaki, M. Conti and A. E. Fergougui, "Demystifying the Transferability of Adversarial Attacks in Computer Networks," in IEEE Transactions on Network and Service Management, vol. 19, no. 3, pp. 3387-3400, Sept. 2022, doi: 10.1109/TNSM.2022.3164354
  16. "Learning Machine Learning Part 3: Attacking Black Box Models." Medium, 4 May 2022, posts.specterops.io/learning-machine-learning-part-3-attacking-blackbox-models-3efffc256909. Accessed 7 Jan. 2024.
  17. Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z. B., & Swami, A. (2016). Practical Black-Box Attacks against Machine Learning. ArXiv. /abs/1602.02697
  18. N. Carlini, S. Chien, M. Nasr, S. Song, A. Terzis and F. Tramèr, "Membership Inference Attacks From First Principles," 2022 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 2022, pp. 1897-1914, doi: 10.1109/SP46214.2022.9833649.
  19. Kariyappa, S., & Qureshi, M. K. (2019). Defending  Against Model Stealing Attacks with Adaptive Misinformation. ArXiv. /abs/1911.07100
  20. Du, A., Chen, B., Chin, T., Law, Y. W., Sasdelli, M., Rajasegaran, R., & Campbell, D. (2021). Physical Adversarial Attacks on an Aerial Imagery Object Detector. ArXiv. /abs/2108.11765
  21. Fawzi, A., Fawzi, O., & Frossard, P. (2016). Universal adversarial perturbations. ArXiv. /abs/1610.08401
  22. "What Are Adversarial Examples in NLP?" Medium, 28 Aug. 2020, towardsdatascience.com/what-are-adversarial-examples-in-nlp-f928c5 74478e. Accessed 14 Jan. 2024.
  23. Cavallo, Elisabetta, and Ragavan, Seyoon. "Adversarial Examples in NLP." COS598C - Deep Learning for Natural Language Processing, Princeton University, 16 April 2020, www.cs.princeton.edu/courses/archive/spring20/cos598C/lectures/lec19-adversarial-examples.pdf
  24. "The Ultimate Guide to Word Embeddings." neptune.ai, 18 Aug. 2023, neptune.ai/blog/word-embeddings-guide. Accessed 14 Jan. 2024.
  25. "A Guide to Word Embedding." Medium, 26 Oct. 2020, towardsdatascience.com/a-guide-to-word-embeddings-8a23817ab60f. Accessed 14 Jan. 2024.
  26. Liu, Huijun, et al. "Textual Adversarial Attacks by Exchanging Text-Self Words." International Journal of Intelligent Systems, vol. 37, no. 12, December 2022, pp. 12212–12234. https://doi.org/10.1002/int.23083
  27. Yuan Zang, Fanchao Qi, Chenghao Yang, Zhiyuan Liu, Meng Zhang, Qun Liu, and Maosong Sun. 2020. Word-level Textual Adversarial Attacking as Combinatorial Optimization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6066–6080, Online. Association for Computational Linguistics.
  28. Gross, D., Simao, T. D., Jansen, N., & Perez, G. A. (2022).Targeted Adversarial Attacks on Deep Reinforcement Learning Policies via Model Checking. ArXiv. /abs/2212.05337
  29. Obadinma, S., Zhu, X., & Guo, H. (2024). Calibration Attack: A Framework For Adversarial Attacks Targeting Calibration.  ArXiv. /abs/2401.02718
  30. Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. 2015. Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS '15). Association for Computing Machinery, New York, NY, USA, 1322–1333. https://doi.org/10.1145/2810103.2813677
  31. Zhang, Haotian, and Ma, Xu. "Misleading attention and classification: An adversarial attack to fool object detection models in the real world." Computers & Security, vol. 122, 2022, article number 102876. ISSN 0167-4048. DOI: 10.1016/j.cose.2022.102876. Accessed [date], https://www.sciencedirect.com/science/article/pii/S016740482200270X
  32. Behzadan, V., & Munir, A. (2018). Mitigation of Policy Manipulation Attacks on Deep Q-Networks with Parameter-Space Noise. ArXiv. /abs/1806.02190
  33. Guo, Sensen, et al. "A Black-Box Attack Method against Machine-Learning-Based Anomaly Network Flow Detection Models." Security and Communication Networks, vol. 2021, 2021, pp. 1-13. DOI: https://doi.org/10.1155/2021/5578335
  34. "Adversarial Attacks with Carlini & Wagner Approach." Medium, 29 Dec. 2023, medium.com/@zachariaharungeorge/adversarial-attacks-with-carliniwagner-approach-8307daa9a503. Accessed 17 Jan. 2024.
  35. Dai, S., Mahloujifar, S., & Mittal, P. (2021). Parameterizing Activation Functions for Adversarial Robustness. ArXiv. /abs/2110.05626
  36. Jospin, L. V., Buntine, W., Boussaid, F., Laga, H., & Bennamoun, M. (2020). Hands-on Bayesian Neural Networks -- a Tutorial for Deep Learning Users. ArXiv. https://doi.org/10.1109/MCI.2022.3155327
  37. "Why You Should Use Bayesian Neural Network." Medium, 17 Oct. 2021, towardsdatascience.com/why-you-should-use-bayesian-neural-networ k-aaf76732c150#:~:text=What%20is%20Bayesian%20Neural%20Net work,that%20best%20fit%20the%20data. Accessed 29 Jan. 2024.
  38. "What Are Bayesian Neural Networks?" Databricks, www.databricks.com/glossary/bayesian-neural-network. Accessed 29 Jan. 2024.
  39. "Bayesian Neural Networks—Implementing, Training, Inference With the JAX Framework." Neptune.Ai, 9 Aug. 2023, neptune.ai/blog/bayesian-neural-networks-with-jax. Accessed 29 Jan. 2024.
  40. Madden, Samuel. “AutoFE : Efficient and Robust Automated Feature Engineering by Hyunjoon Song.” (2018).
  41. "The Art of Feature Engineering: Unraveling the Essence of Data." Medium, 20 Jul. 2023, medium.com/@evertongomede/the-art-of-feature-engineering-unravel ing-the-essence-of-data-9cba7b61502f. Accessed 5 Feb. 2024.
  42. "Only As Strong As Your Data: Using Feature Engineering to Build Robust AI." Liquid Analytics, 24 Jul. 2018, www.liquidanalytics.com/blog/2018/7/24/only-as-strong-as-your-data -using-feature-engineering-to-build-robust-ai. Accessed 5 Feb. 2024.
  43. "Best Practices and Missteps in Feature Engineering for Machine Learning." Quanthub, 5 Oct. 2023, www.quanthub.com/best-practices-and-missteps-in-feature-engineerin g-for-machine-learning/. Accessed 5 Feb. 2024.
  44. Cynthia Dwork and Aaron Roth (2014), "The Algorithmic Foundations of Differential Privacy", Foundations and Trends® in Theoretical Computer Science: Vol. 9: No. 3–4, pp 211-407. http://dx.doi.org/10.1561/0400000042.
  45. "What Is Differential Privacy: Definition, Mechanisms, and Examples." Statice by ANONOS, 21 Dec. 2022, www.statice.ai/post/what-is-differential-privacy-definition-mechanis ms-examples#:~:text=Differential%20privacy%20adds%20noise%20t o,to%20balance%20privacy%20and%20utility. Accessed 5 Feb. 2024.
  46. "Understanding Differential Privacy." Medium, 1 Jul. 2019,towardsdatascience.com/understanding-differential-privacy-85c e191e198a. Accessed 5 Feb. 2024.
  47. "Throttling and Rate Limiting in System Design." Enjoyalgorithms, www.enjoyalgorithms.com/blog/throttling-and-rate-limiting. Accessed 7 Feb. 2024.
  48. "Rate Limiting: Unveiling the Crucial Differences." Linkedin, 23 Aug. 2023, www.linkedin.com/pulse/decoding-api-throttling-rate-limiting-unveili ng-crucial-differences/. Accessed 7 Feb. 2024.
  49. "API Rate Limiting Vs. API Throttling: How Are They Different?" NORDIC APIS, 8 Mar. 2023, nordicapis.com/api-rate-limiting-vs-api-throttling-how-are-they-differ ent/. Accessed 7 Feb. 2024.
  50. Waqas, A., Farooq, H., Bouaynaya, N. C., & Rasool, G. (2022). Exploring robust architectures for deep artificial neural networks. Communications Engineering, 1(1), 1-12. https://doi.org/10.1038/s44172-022-00043-2
  51. Tan, X., Gao, J., & Li, R. (2022). A Simple Structure For Building A Robust Model. ArXiv. /abs/2204.11596
  52. Sharif, A., & Marijan, D. (2021). Adversarial Deep Reinforcement Learning for Improving the Robustness of Multi-agent Autonomous Driving Policies. ArXiv. https://doi.org/10.1109/APSEC57359.2022.00018
  53. Korkmaz, E. (2023). Adversarial Robust Deep Reinforcement Learning Requires Redefining Robustness. ArXiv. /abs/2301.07487
  54. Pinto, L., Davidson, J., Sukthankar, R., & Gupta, A. (2017). Robust Adversarial Reinforcement Learning. ArXiv. /abs/1703.02702
  55. "Using Distillation to Protect Your Neural Networks." Medium, 1 Jul. 2021, towardsdatascience.com/using-distillation-to-protect-your-neural-net works-ea7f0bf3aec4. Accessed 8 Feb. 2024.
  56. "Defensive Distillation." Activeloop, towardsdatascience.com/using-distillation-to-protect-your-neural-net works-ea7f0bf3aec4. Accessed 15 Feb. 2024.
  57. "What Is Defensive Distillation?" Deepai, deepai.org/machine-learning-glossary-and-terms/defensive-distillation . Accessed 25 Feb. 2024.
  58. Papernot, N., McDaniel, P., Wu, X., Jha, S., & Swami, A. (2015). Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks. ArXiv. /abs/1511.04508.
  59. "Fairness in AI: The Challenges of Dealing with Bias in Machine Learning." Medium, 8 Aug. 2023, medium.com/bluetuple-ai/fairness-in-ai-the-challenges-of-dealing-wit h-bias-in-machine-learning-ac82b4bd40d9. Accessed 13 Mar. 2024.
  60. "Unfair Predictions: 5 Common Sources of Bias in Machine Learning." Medium, 14 Apr. 2022, towardsdatascience.com/algorithm-fairness-sources-of-bias-7082e5b7 8a2c. Accessed 13 Mar. 2024.
  61. "Understanding Bias and Fairness in AI Systems." Medium, 25 Mar. 2021, towardsdatascience.com/understanding-bias-and-fairness-in-ai-system s-6f7fbfe267f3#:~:text=Historical%20bias%20is%20the%20already,b een%20historically%20disadvantaged%20or%20excluded. Accessed 13 Mar. 2024.
  62. Hussain, Muhammad Zunnurain. (2023). Data security and Integrity in Cloud Computing. 10.1109/ICONAT57137.2023.10080440.
  63. Ilahi et al., "Challenges and Countermeasures for Adversarial Attacks on Deep Reinforcement Learning," in IEEE Transactions on Artificial Intelligence, vol. 3, no. 2, pp. 90-109, April 2022, doi: 10.1109/TAI.2021.3111139.
  64. Xu, M., Liu, Z., Huang, P., Ding, W., Cen, Z., Li, B., & Zhao, D. (2022). Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities: Robustness, Safety, and Generalizability. ArXiv. /abs/2209.08025
  65. Varona, D., & Suárez, J. L. (2021). Discrimination, Bias, Fairness, and Trustworthy AI. Applied Sciences, 12(12), 5826. https://doi.org/10.3390/app12125826
  66. Ferrara, E. (2024). Fairness and Bias in Artificial Intelligence: A Brief Survey of Sources, Impacts, and Mitigation Strategies. Sci, 6(1), 3. https://doi.org/10.3390/sci6010003
  67. "Root Out Bias at Every Stage of Your AI-Development Process." Harvard Business Review, 30 Oct. 2020, hbr.org/2020/10/root-out-bias-at-every-stage-of-your-ai-developmentprocess. Accessed 4 Apr. 2024.
  68. "Reducing AI Bias with Rejection Option-based Classification." Medium, 13 May 2020, towardsdatascience.com/reducing-ai-bias-with-rejection-option-based -classification-54fefdb53c2e. Accessed 4 Apr. 2024.
  69. EU Commission. "Ethics Guidelines for Trustworthy AI: Context and Implementation." European Commission, 2019, https://ec.europa.eu/digital-single-market/en/news/ethics-guidelines-tr ustworthy-ai.
  70. "Machine Learning Bias." Deepcheck, deepchecks.com/glossary/machine-learning-bias/#:~:text=Bias%20in %20ML%20is%20an,a%20model's%20use%20case%20accurately. Accessed 4 Apr. 2024.

This paper offers a comprehensive examination of adversarial vulnerabilities in machine learning (ML) models and strategies for mitigating fairness and bias issues. It analyses various adversarial attack vectors encompassing evasion, poisoning, model inversion, exploratory probes, and model stealing, elucidating their potential to compromise model integrity and induce misclassification or information leakage. In response, a range of defence mechanisms including adversarial training, certified defences, feature transformations, and ensemble methods are scrutinized, assessing their effectiveness and limitations in fortifying ML models against adversarial threats. Furthermore, the study explores the nuanced landscape of fairness and bias in ML, addressing societal biases, stereotypes reinforcement, and unfair treatment, proposing mitigation strategies like fairness metrics, bias auditing, de-biasing techniques, and human-in-the-loop approaches to foster fairness, transparency, and ethical AI deployment. This synthesis advocates for interdisciplinary collaboration to build resilient, fair, and trustworthy AI systems amidst the evolving technological paradigm.

Keywords : Machine Learning, Artificial Intelligence, Evasion Attacks, Poisoning Attacks, Inversion and Extraction, Exploratory Attacks, Model Stealing, Transferability Attacks, Adversarial Patch Attacks, Model Confidence Attacks, Adversarial Reinforcement Learning, Adversarial Example in NLPs, Black Box Attacks, Carlini Wager Attack, Bayesian Neural Networks, Neural Networks, Robust, Attack and Defence Models. Bias, Bias Mitigation, Ethical Concerns, Training Data, Inversion, Perturbations, Security, Network, Model.

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe