Authors :
Sudeshna Dey; Soumitra De; Jonti Deuri; Siddhartha Chatterjee
Volume/Issue :
Volume 11 - 2026, Issue 1 - January
Google Scholar :
https://tinyurl.com/2t6jpxz5
Scribd :
https://tinyurl.com/52hf5wje
DOI :
https://doi.org/10.38124/ijisrt/26jan1585
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Abstract :
Large language models (LLMs) have been quickly adopted in the financial services industry, allowing for
sophisticated automation in document analysis, client service, compliance monitoring, and decision support. However,
hallucinations, explainability issues, privacy concerns, and restricted access to valuable institutional information limit their
use in regulated financial situations. Financial organizations need systems that ensure factual accuracy, traceability, and
regulatory compliance in addition to producing fluid replies. This chapter describes an expert system for FinTech document
intelligence based on Retrieval-Augmented Generation (RAG) that combines conventional term-based search with
meaningful vector retrieval to guarantee dependable and auditable autonomous reasoning. The entire architecture,
document ingesting pipeline, regulated prompt development, hybrid retrieval mechanism, variable structuring approaches,
embedding generation, encryption model, and assessment methodology are all described. A thorough discussion is given of
practical applications in financial services, identifying fraud, credit risk assessment, and adherence to regulations. A strategy
plan for future study and establishment of policies is also presented, along with ethical issues and regulatory harmonization.
Financial specialists can examine, confirm, and override artificially generated insights when needed thanks to the system's
provision for human during validation. The platform facilitates regulatory examinations and improves transparency by
keeping thorough derivation information and audit recordings for each generated answer. This method bridges the gap
between the strict governance necessities of real-world economic ecosystems and advanced generative intelligence.
Keywords :
Generative AI, FinTech, Retrieval-Augmented Generation, Document Intelligence, Regulatory Compliance, Vector Databases, Hybrid Retrieval, Explainable AI, Auditability.
References :
- T. Brown et al., “Language models are few-shot learners,” in Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 33, pp. 1877–1901, Dec. 2020.
- P. Lewis et al., “Retrieval-augmented generation for knowledge-intensive NLP tasks,” in Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 33, pp. 9459–9474, Dec. 2020.
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics: Human Lang. Technol. (NAACL-HLT), Minneapolis, MN, USA, Jun. 2019, pp. 4171–4186. doi: 10.18653/v1/N19-1423.
- C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval. Cambridge, U.K.: Cambridge Univ. Press, 2008.
- S. Robertson and H. Zaragoza, “The probabilistic relevance framework: BM25 and beyond,” Found. Trends Inf. Retr., vol. 3, no. 4, pp. 333–389, Apr. 2009. doi: 10.1561/1500000019.
- A. Vaswani et al., “Attention is all you need,” in Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 30, Long Beach, CA, USA, Dec. 2017.
- N. Reimers and I. Gurevych, “Sentence-BERT: Sentence embeddings using Siamese BERT-networks,” in Proc. Conf. Empirical Methods Natural Lang. Process. (EMNLP), Hong Kong, China, Nov. 2019, pp. 3982–3992. doi: 10.18653/v1/D19-1410.
- European Commission, “Ethics guidelines for trustworthy AI,” High-Level Expert Group on Artificial Intelligence, Brussels, Belgium, Rep., Apr. 2019.
- Basel Committee on Banking Supervision, “Principles for financial market infrastructures,” Bank for International Settlements, Basel, Switzerland, Rep., Apr. 2012.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Las Vegas, NV, USA, Jun. 2016, pp. 770–778. doi: 10.1109/CVPR.2016.90.
- Y. Liu et al., “RoBERTa: A robustly optimized BERT pretraining approach,” arXiv preprint arXiv:1907.11692, 2019. doi: 10.48550/arXiv.1907.11692.
- D. Jurafsky and J. H. Martin, Speech and Language Processing, 3rd ed. (draft). Upper Saddle River, NJ, USA: Pearson, 2020.
- A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre-training,” OpenAI, San Francisco, CA, USA, Tech. Rep., 2018.
- S. Arora et al., “Theoretical analysis of retrieval-augmented generation models,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2021.
- R. Guo et al., “Accelerating large-scale inference with anisotropic vector quantization,” in Proc. Int. Conf. Mach. Learn. (ICML), 2020, pp. 3887–3896.
- Pinecone Systems, “Vector databases for production AI,” 2022.
- Weaviate Community, “Hybrid search in vector databases,” 2022.
- Information technology — Security techniques — Information security management systems — Requirements, ISO/IEC 27001:2013, Int. Org. Standardization, Geneva, Switzerland, 2013.
- M. Kearns and A. Roth, The Ethical Algorithm: The Science of Socially Aware Algorithm Design. New York, NY, USA: Oxford Univ. Press, 2019.
- Reserve Bank of India, “Guidelines on digital lending,” RBI/2022-23/111, Sep. 2022.
- Regulation (EU) 2016/679 of the European Parliament and of the Council (General Data Protection Regulation), European Union, Brussels, Belgium, 2016.
- PCI Security Standards Council, “Payment Card Industry (PCI) Data Security Standard: Requirements and testing procedures,” ver. 4.0, Mar. 2022.
- J. Zhang, Z. Zhang, and D. Wang, “Explainable AI in finance: A survey,” IEEE Access, vol. 9, pp. 126581–126599, 2021. doi: 10.1109/ACCESS.2021.3110594.
- S. Ruder, “Neural transfer learning for natural language processing,” Ph.D. dissertation, Nat. Univ. Ireland, Galway, Ireland, 2019.
- M. Chen et al., “Evaluating large language models trained on code,” arXiv preprint arXiv:2107.03374, 2021. doi: 10.48550/arXiv.2107.03374.
- A. Fan et al., “Augmenting transformers with KNN-based composite memory,” in Proc. Assoc. Comput. Linguistics (ACL), Online, Aug. 2021, pp. 327–340. doi: 10.18653/v1/2021.acl-long.28.
- E. M. Bender, T. Gebru, A. McMillan-Major, and S. Shmitchell, “On the dangers of stochastic parrots: Can language models be too big?” in Proc. ACM Conf. Fairness, Accountability, and Transparency (FAccT), Virtual Event, Mar. 2021, pp. 610–623. doi: 10.1145/3442188.3445922.
- McKinsey Global Institute, “The value of generative AI in banking,” McKinsey & Company, Rep., 2022.
- OECD, “OECD AI policy observatory,” 2022.
- IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems, Ethically Aligned Design: A Vision for Prioritizing Human Well-being with Autonomous and Intelligent Systems, 1st ed., IEEE, New York, NY, USA, 2019.
- Poushali Das, Charanjit Singh, Rituparna Mondal, Dipika Paul, Nitu Saha and Siddhartha Chatterjee “An Intelligent Geofenced Air Quality Monitoring System: Real-Time AQI Detection and Autonomous Location-Based Health Intervention Using Machine Learning” in International Journal of Innovative Science and Research Technology (IJISRT). Vol. 10, Issue. 12, ISSN No. 2456-2165,pp.2376-2389,DOI:https:/doi.org/10.38124/ijisrt/25dec1660 on 2026/01/05.
- Nayan Adhikari, Pallabi Ghosh, Abhinaba Bhattacharyya and Siddhartha Chatterjee “AQIP: Air Quality Index Prediction Using Supervised ML Classifiers” in International Journal of Innovative Science and Research Technology (IJISRT). Vol 10,Issue.7,ISSNNo.2456-2165,pp.835-842,DOI: https://doi.org/10.38124/ijisrt/25jul758 on 2025/07/16.
- Nitu Saha, Rituparna Mondal, Arunima Banerjee, Rupa Debnath and Siddhartha Chatterjee “Advanced DeepLungCareNet: A Next-Generation Framework for Lung Cancer Prediction”, in International Journal of Innovative Science and Research Technology (IJISRT), Vol. 10, Issue.6, ISSN No. 2456-2165, pp. 2312-2320, DOI: https://doi.org/10.38124/ijisrt/25jun1801 on 2025/07/02..
- Rajdeep Chatterjee, Siddhartha Chatterjee, Saikat Samanta and Suman Biswas “AI Approaches to Investigate EEG Signal Classification for Cognitive Performance Assessment” In the 6th International Conference on Computational Intelligence and Networks (CINE 2024), IEEE Conference Record#63708, IEEE Computer Society, IEEE CTSoc, IEEE Digital Explore indexed by SCOPUS and Web ofScience(WoS),pp.1-23,February,2025,DOI: 10.1109/CINE63708.2024.10881208.
- Sima Das, Siddhartha Chatterjee, Altaf Ismail Karani and Anup Kumar Ghosh, “Stress Detection while doing Exam using EEG with Machine Learning Techniques”, In the Proceedings of Innovations in Data Analytics (ICIDA 2023, Volume 2), Lecture Notes in Networks and Systems (LNNS, Volume 1005), ISSN Electronic: 2367-3389, ISBN (eBook): 978-981-97-4928-7, pp.177-187, 10th Sept. 2024, DOI: http://doi.org/10.1007/978-981-97-4928-7_14, Springer Singapore.
- Ahona Ghosh, Siddhartha Chatterjee, Soumitra De and Atindra Maji,“Towards Data-Driven Cognitive Rehabilitation for Speech Disorder in Hybrid Sensor Architecture”, 2022 IEEE International Conference on Electronics, Computing and CommunicationTechnologies(CONNECT),2022,DOI:10.1109/CONNECT55679.2022.9865794,pp. 1-6.
- Mauparna Nandan, Siddhartha Chatterjee, Antara Parai and Oindrila Bagchi, “Sentiment Analysis of Twitter Classification by Applying Hybrid Based Techniques”, Lecture Notes in Electrical Engineering (LNEE), ICCDC 2021, vol 51, pp 591-606, http://doi.org/10.1-1007/978-981-16-9154-6_1, Springer on 2022.
- Sudipta Hazra, Siddhartha Chatterjee, Rituparna Mondal and Anwesa Naskar “Analysis and Comparison Study of Cardiovascular Risk Prediction using Machine Learning Approaches” in the Proceedings of International Conference on Advanced Computing and Systems (AdComSys2024), Springer Nature Book Series, Singapore, “Algorithm for Intelligent Systems” – SCOPUS, Web of Science Indexed, DOI: https://doi.org/10.1007/978-981-97-9532-1_11 pp. 125-134 on 23rd July 2025.
- Anudeepa Gon, Sudipta Hazra, Siddhartha Chatterjee and Anup Kumar Ghosh “Application of Machine Learning Algorithms for Automatic Detection of Risk in Heart Disease” In IGI Global, Book Name – Cognitive Cardiac Rehabilitation Using IoT and AI Tools, DOI: 10.4018/978-1-6684-7561-4, pp. 166-188, ISBN13: 9781668475614, EISBN13: 9781668475621.
- Sudipta Hazra, Swagata Mahapatra, Siddhartha Chatterjee and Dipanwita Pal, “Automated Risk Prediction of Liver Disorders Using Machine Learning” In the proceedings of 1st International conference on Latest Trends on Applied Science, Management, Humanities and Information Technology (SAICON-IC-LTASMHIT-2023) on 19th June 2023, ISSN: 978-81-957386-1-8, pp. 301-306, In Association with Alpha-LPHA Scientific work, IQAC, Department of Science, Computer Science and Application, Sai College.
- Payel Ghosh, Sudipta Hazra and Siddhartha Chatterjee, “Future Prospects Analysis in Healthcare Management Using Machine Learning Algorithms” In the International Journal of Engineering and Science Invention (IJESI), ISSN (online): 2319-6734, ISSN (print): 2319-6726, Vol.12, Issue 6, pp. 52-56, Impact Factor – 5.962, UGC Sl. No.- 2573, Journal No.- 43302, DOI: 10.35629/6734-12065256, June 17, 2023.
- Mauparna Nandan, Siddhartha Chatterjee, Antara Parai and Oindrila Bagchi, “Sentiment Analysis of Twitter Classification by Applying Hybrid Based Techniques”, Lecture Notes in Electrical Engineering (LNEE), ICCDC 2021, vol 51, pp 591-606, http://doi.org/10.1-1007/978-981-16-9154-6_1, Springer on 2022.
- Sangita Bose, Siddhartha Chatterjee, Bidesh Chakraborty, Pratik Halder and Saikat Samanta, “An Analysis and Discussion of Human Sentiment based on Social Network Information”, In International Journal of HIT Transaction on ECCN, Online at http://hithaldia.in/paper/7_1a/J7_1A_05.pdf, Print ISSN: 0973-6875, vol. Issue 1A (2021), pp. 62-71, DOI: 10.5281/zenodo.5892855, 2021 at Haldia Institute of Technology Publishing (ECCN Transaction).
- Rajdeep Chatterjee, Siddhartha Chatterjee, Ankita Datta and Debarshi Kumar Sanyal, “Diversity Matrix based Performance Improvement for Ensemble Learning Approach”, In Hybrid Computational Intelligence: Research and Applications, 2019, CRC Press, Taylor and Francis Group on October 1, 2019, ISBN-97811-3832-0253-CAT#K391719.
- Sutirtha Kumar Guha, Somasree Bhadra, Sudipta Hazra, Siddhartha Chatterjee and Abhinaba Bhattacharyya “Classical Optimization Problem Solution using Nature Inspired Algorithm”, in IEEE 4th International Conference on Recent Advances in Electrical, Electronics, Ubiquitous Communication and Computational Intelligence (RAEEUCCI-2025), IEEE Xplore Digital Library, IEEE Madras Section, pp.DOI:10.1109/RAEEUCCI63961.2025.11048319, ISBN: 979-8-3503-9266-1,SCOPUS & DBLP Indexed on 28th June, 2025 organized by SRMIST, Tamil Nadu, India.
- Arunima Banerjee, Nitu Saha, Arijita Washim Akram, Saundarya Biswas and Siddhartha Chatterjee “Handwritten Digit Pattern Recognition by Hybrid of Convolutional Neural Network (CNN) and Boosting Classifier”, in International Journal of Innovative Science and Research Technology (IJISRT), Vol. 10,Issue.7,ISSNNo.2456-2165,pp.1012-1025,DOI:https://doi.org/10.38124/ijisrt/25jul782 on 2025/7/17.
Large language models (LLMs) have been quickly adopted in the financial services industry, allowing for
sophisticated automation in document analysis, client service, compliance monitoring, and decision support. However,
hallucinations, explainability issues, privacy concerns, and restricted access to valuable institutional information limit their
use in regulated financial situations. Financial organizations need systems that ensure factual accuracy, traceability, and
regulatory compliance in addition to producing fluid replies. This chapter describes an expert system for FinTech document
intelligence based on Retrieval-Augmented Generation (RAG) that combines conventional term-based search with
meaningful vector retrieval to guarantee dependable and auditable autonomous reasoning. The entire architecture,
document ingesting pipeline, regulated prompt development, hybrid retrieval mechanism, variable structuring approaches,
embedding generation, encryption model, and assessment methodology are all described. A thorough discussion is given of
practical applications in financial services, identifying fraud, credit risk assessment, and adherence to regulations. A strategy
plan for future study and establishment of policies is also presented, along with ethical issues and regulatory harmonization.
Financial specialists can examine, confirm, and override artificially generated insights when needed thanks to the system's
provision for human during validation. The platform facilitates regulatory examinations and improves transparency by
keeping thorough derivation information and audit recordings for each generated answer. This method bridges the gap
between the strict governance necessities of real-world economic ecosystems and advanced generative intelligence.
Keywords :
Generative AI, FinTech, Retrieval-Augmented Generation, Document Intelligence, Regulatory Compliance, Vector Databases, Hybrid Retrieval, Explainable AI, Auditability.