Authors :
Archit Lakhani; Neyah Rohit
Volume/Issue :
Volume 9 - 2024, Issue 6 - June
Google Scholar :
https://tinyurl.com/2a6c9xjf
Scribd :
https://tinyurl.com/ct2kyxwr
DOI :
https://doi.org/10.38124/ijisrt/IJISRT24JUN1671
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Abstract :
This paper offers a comprehensive
examination of adversarial vulnerabilities in machine
learning (ML) models and strategies for mitigating
fairness and bias issues. It analyses various adversarial
attack vectors encompassing evasion, poisoning, model
inversion, exploratory probes, and model stealing,
elucidating their potential to compromise model integrity
and induce misclassification or information leakage. In
response, a range of defence mechanisms including
adversarial training, certified defences, feature
transformations, and ensemble methods are scrutinized,
assessing their effectiveness and limitations in fortifying
ML models against adversarial threats. Furthermore, the
study explores the nuanced landscape of fairness and bias
in ML, addressing societal biases, stereotypes
reinforcement, and unfair treatment, proposing
mitigation strategies like fairness metrics, bias auditing,
de-biasing techniques, and human-in-the-loop
approaches to foster fairness, transparency, and ethical
AI deployment. This synthesis advocates for
interdisciplinary collaboration to build resilient, fair, and
trustworthy AI systems amidst the evolving technological
paradigm.
Keywords :
Machine Learning, Artificial Intelligence, Evasion Attacks, Poisoning Attacks, Inversion and Extraction, Exploratory Attacks, Model Stealing, Transferability Attacks, Adversarial Patch Attacks, Model Confidence Attacks, Adversarial Reinforcement Learning, Adversarial Example in NLPs, Black Box Attacks, Carlini Wager Attack, Bayesian Neural Networks, Neural Networks, Robust, Attack and Defence Models. Bias, Bias Mitigation, Ethical Concerns, Training Data, Inversion, Perturbations, Security, Network, Model.
References :
- "Adversarial Attacks and Perturbations." Nightfall AI, www.nightfall.ai/ai-security-101/adversarial-attacks-and-perturbations #:~:text=attacks%20and%20perturbations%3F-,Adversarial%20attack s%20and%20perturbations%20are%20techniques%20used%20to%20 exploit%20vulnerabilities,making%20incorrect%20predictions%20or %20decisions. Accessed 4 Jan. 2024.
- "Adversarial Attacks on Neural Networks: Exploring the Fast Gradient Sign Method." neptune.ai, 24 Aug. 2023, neptune.ai/blog/adversarial-attacks-on-neural-networks-exploring-thefast-gradient-sign-method#:~:text=The%20Fast%20Gradient%20Sign %20Method%20%28FGSM%29%20combines%20a,a%20neural%20 network%20model%20into%20making%20wrong%20predictions. Accessed 4 Jan. 2024.
- "Know Your Enemy: How You Can Create and Defend against Adversarial Attacks." Medium, 6 Jan. 2019, towardsdatascience.com/know-your-enemy-7f7c5038bdf3. Accessed 4 Jan. 2024.
- "Data Poisoning: How Machine Learning Gets Corrupted." Roboticsbiz, 11 May 2022, roboticsbiz.com/data-poisoning-how-machine-learning-gets-corrupted /. Accessed 4 Jan. 2024.
- Zhuo Lv, Hongbo Cao, Feng Zhang, Yuange Ren, Bin Wang, Cen Chen, Nuannuan Li, Hao Chang, Wei Wang, AWFC: Preventing Label Flipping Attacks Towards Federated Learning for Intelligent IoT, The Computer Journal, Volume 65, Issue 11, November 2022, Pages 2849–2859, https://doi.org/10.1093/comjnl/bxac124
- Salem, A. et al. “Dynamic Backdoor Attacks Against Machine Learning Models.” 2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P) (2020): 703-718.
- Soremekun, E., Udeshi, S., & Chattopadhyay, S. (2020). Towards Backdoor Attacks and Defense in Robust Machine Learning Models. ArXiv. /abs/2003.00865
- "Exploratory Attacks." StackExchange, ai.stackexchange.com/questions/16502/what-are-causative-and-explor atory-attacks-in-adversarial-machine-learning#:~:text=An%20explora tory%20attack%20is%20sending%20tons%20of%20inquiries,they%2 0could%20try%20to%20reconstruct%20the%20data%20set. Accessed 5 Jan. 2024.
- Joseph AD, Nelson B, Rubinstein BIP, Tygar JD. Exploratory Attacks on Machine Learning. In: Adversarial Machine Learning. Cambridge University Press; 2019:165-166.
- "The Threat of Query Attacks on Machine Learning Models." Defence.Ai, 19 Jul. 2022, defence.ai/ai-security/query-attacks-ml/#what-are-query-attacks. Accessed 6 Jan. 2024.
- "Output Privacy and Federated Machine Learning." ScaleOut, 26 Jun. 2023, www.scaleoutsystems.com/post/output-privacy-and-federated-machin e-learning#:~:text=Model%20Reverse%20Engineering&text=Model %20inversion%20attacks%20aim%20to,3%2C%204%2C%205%5D. Accessed 6 Jan. 2024.
- "ML03:2023 Model Inversion Attack Description." OWASP, owasp.org/www-project-machine-learning-security-top-10/docs/ML0 3_2023-Model_Inversion_Attack. Accessed 6 Jan. 2024.
- Tramer, F., Carlini, N., Brendel, W., & Madry, A. (2020). On Adaptive Attacks to Adversarial Example Defenses. ArXiv. /abs/2002.08347
- Demontis, A., Melis, M., Pintor, M., Jagielski, M., Biggio, B., Oprea, A., & Roli, F. (2018). Why Do Adversarial Attacks Transfer? Explaining Transferability of Evasion and Poisoning Attacks. ArXiv. /abs/1809.02861
- E. Nowroozi, Y. Mekdad, M. H. Berenjestanaki, M. Conti and A. E. Fergougui, "Demystifying the Transferability of Adversarial Attacks in Computer Networks," in IEEE Transactions on Network and Service Management, vol. 19, no. 3, pp. 3387-3400, Sept. 2022, doi: 10.1109/TNSM.2022.3164354
- "Learning Machine Learning Part 3: Attacking Black Box Models." Medium, 4 May 2022, posts.specterops.io/learning-machine-learning-part-3-attacking-blackbox-models-3efffc256909. Accessed 7 Jan. 2024.
- Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z. B., & Swami, A. (2016). Practical Black-Box Attacks against Machine Learning. ArXiv. /abs/1602.02697
- N. Carlini, S. Chien, M. Nasr, S. Song, A. Terzis and F. Tramèr, "Membership Inference Attacks From First Principles," 2022 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 2022, pp. 1897-1914, doi: 10.1109/SP46214.2022.9833649.
- Kariyappa, S., & Qureshi, M. K. (2019). Defending Against Model Stealing Attacks with Adaptive Misinformation. ArXiv. /abs/1911.07100
- Du, A., Chen, B., Chin, T., Law, Y. W., Sasdelli, M., Rajasegaran, R., & Campbell, D. (2021). Physical Adversarial Attacks on an Aerial Imagery Object Detector. ArXiv. /abs/2108.11765
- Fawzi, A., Fawzi, O., & Frossard, P. (2016). Universal adversarial perturbations. ArXiv. /abs/1610.08401
- "What Are Adversarial Examples in NLP?" Medium, 28 Aug. 2020, towardsdatascience.com/what-are-adversarial-examples-in-nlp-f928c5 74478e. Accessed 14 Jan. 2024.
- Cavallo, Elisabetta, and Ragavan, Seyoon. "Adversarial Examples in NLP." COS598C - Deep Learning for Natural Language Processing, Princeton University, 16 April 2020, www.cs.princeton.edu/courses/archive/spring20/cos598C/lectures/lec19-adversarial-examples.pdf
- "The Ultimate Guide to Word Embeddings." neptune.ai, 18 Aug. 2023, neptune.ai/blog/word-embeddings-guide. Accessed 14 Jan. 2024.
- "A Guide to Word Embedding." Medium, 26 Oct. 2020, towardsdatascience.com/a-guide-to-word-embeddings-8a23817ab60f. Accessed 14 Jan. 2024.
- Liu, Huijun, et al. "Textual Adversarial Attacks by Exchanging Text-Self Words." International Journal of Intelligent Systems, vol. 37, no. 12, December 2022, pp. 12212–12234. https://doi.org/10.1002/int.23083
- Yuan Zang, Fanchao Qi, Chenghao Yang, Zhiyuan Liu, Meng Zhang, Qun Liu, and Maosong Sun. 2020. Word-level Textual Adversarial Attacking as Combinatorial Optimization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6066–6080, Online. Association for Computational Linguistics.
- Gross, D., Simao, T. D., Jansen, N., & Perez, G. A. (2022).Targeted Adversarial Attacks on Deep Reinforcement Learning Policies via Model Checking. ArXiv. /abs/2212.05337
- Obadinma, S., Zhu, X., & Guo, H. (2024). Calibration Attack: A Framework For Adversarial Attacks Targeting Calibration. ArXiv. /abs/2401.02718
- Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. 2015. Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS '15). Association for Computing Machinery, New York, NY, USA, 1322–1333. https://doi.org/10.1145/2810103.2813677
- Zhang, Haotian, and Ma, Xu. "Misleading attention and classification: An adversarial attack to fool object detection models in the real world." Computers & Security, vol. 122, 2022, article number 102876. ISSN 0167-4048. DOI: 10.1016/j.cose.2022.102876. Accessed [date], https://www.sciencedirect.com/science/article/pii/S016740482200270X
- Behzadan, V., & Munir, A. (2018). Mitigation of Policy Manipulation Attacks on Deep Q-Networks with Parameter-Space Noise. ArXiv. /abs/1806.02190
- Guo, Sensen, et al. "A Black-Box Attack Method against Machine-Learning-Based Anomaly Network Flow Detection Models." Security and Communication Networks, vol. 2021, 2021, pp. 1-13. DOI: https://doi.org/10.1155/2021/5578335
- "Adversarial Attacks with Carlini & Wagner Approach." Medium, 29 Dec. 2023, medium.com/@zachariaharungeorge/adversarial-attacks-with-carliniwagner-approach-8307daa9a503. Accessed 17 Jan. 2024.
- Dai, S., Mahloujifar, S., & Mittal, P. (2021). Parameterizing Activation Functions for Adversarial Robustness. ArXiv. /abs/2110.05626
- Jospin, L. V., Buntine, W., Boussaid, F., Laga, H., & Bennamoun, M. (2020). Hands-on Bayesian Neural Networks -- a Tutorial for Deep Learning Users. ArXiv. https://doi.org/10.1109/MCI.2022.3155327
- "Why You Should Use Bayesian Neural Network." Medium, 17 Oct. 2021, towardsdatascience.com/why-you-should-use-bayesian-neural-networ k-aaf76732c150#:~:text=What%20is%20Bayesian%20Neural%20Net work,that%20best%20fit%20the%20data. Accessed 29 Jan. 2024.
- "What Are Bayesian Neural Networks?" Databricks, www.databricks.com/glossary/bayesian-neural-network. Accessed 29 Jan. 2024.
- "Bayesian Neural Networks—Implementing, Training, Inference With the JAX Framework." Neptune.Ai, 9 Aug. 2023, neptune.ai/blog/bayesian-neural-networks-with-jax. Accessed 29 Jan. 2024.
- Madden, Samuel. “AutoFE : Efficient and Robust Automated Feature Engineering by Hyunjoon Song.” (2018).
- "The Art of Feature Engineering: Unraveling the Essence of Data." Medium, 20 Jul. 2023, medium.com/@evertongomede/the-art-of-feature-engineering-unravel ing-the-essence-of-data-9cba7b61502f. Accessed 5 Feb. 2024.
- "Only As Strong As Your Data: Using Feature Engineering to Build Robust AI." Liquid Analytics, 24 Jul. 2018, www.liquidanalytics.com/blog/2018/7/24/only-as-strong-as-your-data -using-feature-engineering-to-build-robust-ai. Accessed 5 Feb. 2024.
- "Best Practices and Missteps in Feature Engineering for Machine Learning." Quanthub, 5 Oct. 2023, www.quanthub.com/best-practices-and-missteps-in-feature-engineerin g-for-machine-learning/. Accessed 5 Feb. 2024.
- Cynthia Dwork and Aaron Roth (2014), "The Algorithmic Foundations of Differential Privacy", Foundations and Trends® in Theoretical Computer Science: Vol. 9: No. 3–4, pp 211-407. http://dx.doi.org/10.1561/0400000042.
- "What Is Differential Privacy: Definition, Mechanisms, and Examples." Statice by ANONOS, 21 Dec. 2022, www.statice.ai/post/what-is-differential-privacy-definition-mechanis ms-examples#:~:text=Differential%20privacy%20adds%20noise%20t o,to%20balance%20privacy%20and%20utility. Accessed 5 Feb. 2024.
- "Understanding Differential Privacy." Medium, 1 Jul. 2019,towardsdatascience.com/understanding-differential-privacy-85c e191e198a. Accessed 5 Feb. 2024.
- "Throttling and Rate Limiting in System Design." Enjoyalgorithms, www.enjoyalgorithms.com/blog/throttling-and-rate-limiting. Accessed 7 Feb. 2024.
- "Rate Limiting: Unveiling the Crucial Differences." Linkedin, 23 Aug. 2023, www.linkedin.com/pulse/decoding-api-throttling-rate-limiting-unveili ng-crucial-differences/. Accessed 7 Feb. 2024.
- "API Rate Limiting Vs. API Throttling: How Are They Different?" NORDIC APIS, 8 Mar. 2023, nordicapis.com/api-rate-limiting-vs-api-throttling-how-are-they-differ ent/. Accessed 7 Feb. 2024.
- Waqas, A., Farooq, H., Bouaynaya, N. C., & Rasool, G. (2022). Exploring robust architectures for deep artificial neural networks. Communications Engineering, 1(1), 1-12. https://doi.org/10.1038/s44172-022-00043-2
- Tan, X., Gao, J., & Li, R. (2022). A Simple Structure For Building A Robust Model. ArXiv. /abs/2204.11596
- Sharif, A., & Marijan, D. (2021). Adversarial Deep Reinforcement Learning for Improving the Robustness of Multi-agent Autonomous Driving Policies. ArXiv. https://doi.org/10.1109/APSEC57359.2022.00018
- Korkmaz, E. (2023). Adversarial Robust Deep Reinforcement Learning Requires Redefining Robustness. ArXiv. /abs/2301.07487
- Pinto, L., Davidson, J., Sukthankar, R., & Gupta, A. (2017). Robust Adversarial Reinforcement Learning. ArXiv. /abs/1703.02702
- "Using Distillation to Protect Your Neural Networks." Medium, 1 Jul. 2021, towardsdatascience.com/using-distillation-to-protect-your-neural-net works-ea7f0bf3aec4. Accessed 8 Feb. 2024.
- "Defensive Distillation." Activeloop, towardsdatascience.com/using-distillation-to-protect-your-neural-net works-ea7f0bf3aec4. Accessed 15 Feb. 2024.
- "What Is Defensive Distillation?" Deepai, deepai.org/machine-learning-glossary-and-terms/defensive-distillation . Accessed 25 Feb. 2024.
- Papernot, N., McDaniel, P., Wu, X., Jha, S., & Swami, A. (2015). Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks. ArXiv. /abs/1511.04508.
- "Fairness in AI: The Challenges of Dealing with Bias in Machine Learning." Medium, 8 Aug. 2023, medium.com/bluetuple-ai/fairness-in-ai-the-challenges-of-dealing-wit h-bias-in-machine-learning-ac82b4bd40d9. Accessed 13 Mar. 2024.
- "Unfair Predictions: 5 Common Sources of Bias in Machine Learning." Medium, 14 Apr. 2022, towardsdatascience.com/algorithm-fairness-sources-of-bias-7082e5b7 8a2c. Accessed 13 Mar. 2024.
- "Understanding Bias and Fairness in AI Systems." Medium, 25 Mar. 2021, towardsdatascience.com/understanding-bias-and-fairness-in-ai-system s-6f7fbfe267f3#:~:text=Historical%20bias%20is%20the%20already,b een%20historically%20disadvantaged%20or%20excluded. Accessed 13 Mar. 2024.
- Hussain, Muhammad Zunnurain. (2023). Data security and Integrity in Cloud Computing. 10.1109/ICONAT57137.2023.10080440.
- Ilahi et al., "Challenges and Countermeasures for Adversarial Attacks on Deep Reinforcement Learning," in IEEE Transactions on Artificial Intelligence, vol. 3, no. 2, pp. 90-109, April 2022, doi: 10.1109/TAI.2021.3111139.
- Xu, M., Liu, Z., Huang, P., Ding, W., Cen, Z., Li, B., & Zhao, D. (2022). Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities: Robustness, Safety, and Generalizability. ArXiv. /abs/2209.08025
- Varona, D., & Suárez, J. L. (2021). Discrimination, Bias, Fairness, and Trustworthy AI. Applied Sciences, 12(12), 5826. https://doi.org/10.3390/app12125826
- Ferrara, E. (2024). Fairness and Bias in Artificial Intelligence: A Brief Survey of Sources, Impacts, and Mitigation Strategies. Sci, 6(1), 3. https://doi.org/10.3390/sci6010003
- "Root Out Bias at Every Stage of Your AI-Development Process." Harvard Business Review, 30 Oct. 2020, hbr.org/2020/10/root-out-bias-at-every-stage-of-your-ai-developmentprocess. Accessed 4 Apr. 2024.
- "Reducing AI Bias with Rejection Option-based Classification." Medium, 13 May 2020, towardsdatascience.com/reducing-ai-bias-with-rejection-option-based -classification-54fefdb53c2e. Accessed 4 Apr. 2024.
- EU Commission. "Ethics Guidelines for Trustworthy AI: Context and Implementation." European Commission, 2019, https://ec.europa.eu/digital-single-market/en/news/ethics-guidelines-tr ustworthy-ai.
- "Machine Learning Bias." Deepcheck, deepchecks.com/glossary/machine-learning-bias/#:~:text=Bias%20in %20ML%20is%20an,a%20model's%20use%20case%20accurately. Accessed 4 Apr. 2024.
This paper offers a comprehensive
examination of adversarial vulnerabilities in machine
learning (ML) models and strategies for mitigating
fairness and bias issues. It analyses various adversarial
attack vectors encompassing evasion, poisoning, model
inversion, exploratory probes, and model stealing,
elucidating their potential to compromise model integrity
and induce misclassification or information leakage. In
response, a range of defence mechanisms including
adversarial training, certified defences, feature
transformations, and ensemble methods are scrutinized,
assessing their effectiveness and limitations in fortifying
ML models against adversarial threats. Furthermore, the
study explores the nuanced landscape of fairness and bias
in ML, addressing societal biases, stereotypes
reinforcement, and unfair treatment, proposing
mitigation strategies like fairness metrics, bias auditing,
de-biasing techniques, and human-in-the-loop
approaches to foster fairness, transparency, and ethical
AI deployment. This synthesis advocates for
interdisciplinary collaboration to build resilient, fair, and
trustworthy AI systems amidst the evolving technological
paradigm.
Keywords :
Machine Learning, Artificial Intelligence, Evasion Attacks, Poisoning Attacks, Inversion and Extraction, Exploratory Attacks, Model Stealing, Transferability Attacks, Adversarial Patch Attacks, Model Confidence Attacks, Adversarial Reinforcement Learning, Adversarial Example in NLPs, Black Box Attacks, Carlini Wager Attack, Bayesian Neural Networks, Neural Networks, Robust, Attack and Defence Models. Bias, Bias Mitigation, Ethical Concerns, Training Data, Inversion, Perturbations, Security, Network, Model.