Authors :
Muhammad Nuraddeen Ado; Shafi’i Muhammad Abdulhamid; Idris Ismaila
Volume/Issue :
Volume 11 - 2026, Issue 1 - January
Google Scholar :
https://tinyurl.com/3t49c2pp
Scribd :
https://tinyurl.com/mr5cepfz
DOI :
https://doi.org/10.38124/ijisrt/26jan950
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Abstract :
Financial fraud remains a persistent and evolving threat, requiring robust machine learning (ML) models for
effective detection. However, access to real-world financial transaction data is limited due to privacy restrictions and
regulatory concerns, creating a gap in fraud detection research. This study introduces SFinDSet, a synthetic financial
transaction dataset designed to simulate real-world banking operations for fraud detection, money laundering prevention,
and financial risk assessment. The dataset's reliability was assessed through exploratory data analysis (EDA) and validated
using anomaly detection techniques. To benchmark its performance, SFinDSet was evaluated against two established
datasets: BankDSet (a real-world financial dataset) and SynFraudDataset (a synthetic fraud dataset). Various ML models,
including Systematic Detection (SyD), Random Forest (RF), Isolation Forest (IF), DBSCAN, SVM, and PCA, were tested
across these datasets. The results demonstrated that SyD achieved 100% recall, effectively detecting fraud while minimizing
false negatives—outperforming traditional models, which exhibited high false negative rates. These findings validate
SFinDSet as a reliable benchmark dataset, highlighting the critical role of synthetic financial datasets in advancing fraud
detection research.
Keywords :
Synthetic Financial Datasets, Fraud Detection, Machine Learning Models.
References :
- A. Alhchaimi, “Cloud-based transaction fraud detection: An in-depth analysis of ML algorithms,” Wasit Journal of Computer and Mathematics Science, 2024.
- E. Altman, B. Egressy, J. Blanuvsa, and K. Atasu, “Realistic synthetic financial transactions for anti-money laundering models,” ArXiv, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2306.16424
- S. Amjad, M. Younas, M. Anwar, Q. Shaheen, M. Shiraz, and A. Gani, “Data mining techniques to analyze the impact of social media on academic performance of high school students,” Wireless Communications and Mobile Computing, 2022. [Online]. Available: https://doi.org/10.1155/2022/9299115
- K. Anvesh, M. Srilatha, T. R. Reddy, M. G. Chand, and G. Jyothi, “Improving student academic performance using an attribute selection algorithm,” Advances in Intelligent Systems and Computing, 2018. [Online]. Available: https://doi.org/10.1007/978-981-13-1580-0_52
- A. Farissi, H. M. Dahlan, and Samsuryadi, “Genetic algorithm-based feature selection for predicting student's academic performance,” Lecture Notes in Computer Science, pp. 110–117, 2019. [Online]. Available: https://doi.org/10.1007/978-3-030-33582-3_11
- Kaggle, “Bank Transactions Dataset.” [Online]. Available: https://www.kaggle.com/datasets
- Kaggle, “Synthetic Fraud Dataset.” [Online]. Available: https://www.kaggle.com/datasets
- C. Hyginus, F. C. Eze, and C. I. Nwogu, “Review of the implications of uploading unverified dataset in a data banking site (Case study of Kaggle),” International Journal of Data Science Research, 2022.
- J. Huang, “The impact of mental health on academic performance: Comparative insights from original and simulated data,” Journal of Educational Psychology and Data Science, 2024.
- S. Jesus et al., “Turning the tables: Biased, imbalanced, dynamic tabular datasets for ML evaluation,” Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks, 2022. [Online]. Available: https://github.com/feedzai/bank-account-fraud
- T. Kuroki, “Integrating data science into an econometrics course with a Kaggle competition,” Journal of Econometrics Education, 2023.
- D. Kowald et al., “Using the Open Meta Kaggle Dataset to evaluate tripartite recommendations in data markets,” ArXiv, vol. abs/1908.04017, 2019. [Online]. Available: https://doi.org/10.48550/arXiv.1908.04017
- Z. Miao, “Financial fraud detection and prevention,” Journal of Organizational and End User Computing, 2024.
- A. Mohapatra, A. Kumar, B. Kumar, H. Agarwal, and R. Priyadarshini, “Synthetic data generation and handling data imbalance for mobile financial transactions,” 2024 IEEE 13th International Conference on Communication Systems and Network Technologies (CSNT), pp. 1197–1202, 2024. [Online]. Available: https://doi.org/10.1109/CSNT60213.2024.10546178
- D. C. Ruiz, D. Fletcher, A. Hall, and K. King, “Kaggle competitions in the classroom: Retrospectives and recommendations,” Operations Research & Management Science, vol. 47, no. 4, 2020.
- B. Stojanović and J. Bozic, “Robust financial fraud alerting system based in the cloud environment,” Sensors (Basel, Switzerland), vol. 22, 2022. [Online]. Available: https://consensus.app/papers/robust-financial-fraud-alerting-system-based-in-the-cloud-stojanović-bozic/2f9b68519e785a2aa0651f9e93becb55/?utm_source=chatgpt
- Y. Yang, Y. Yu, and T. Li, “Deep learning techniques for financial fraud detection,” 2022 14th International Conference on Computer Research and Development (ICCRD), pp. 16–22, 2022.
- Muhammad Nuraddeen Ado. (2025). SFinDSet for Systematic Detection of FinCrimes [Data set]. Kaggle. https://doi.org/10.34740/KAGGLE/DSV/11299085
Financial fraud remains a persistent and evolving threat, requiring robust machine learning (ML) models for
effective detection. However, access to real-world financial transaction data is limited due to privacy restrictions and
regulatory concerns, creating a gap in fraud detection research. This study introduces SFinDSet, a synthetic financial
transaction dataset designed to simulate real-world banking operations for fraud detection, money laundering prevention,
and financial risk assessment. The dataset's reliability was assessed through exploratory data analysis (EDA) and validated
using anomaly detection techniques. To benchmark its performance, SFinDSet was evaluated against two established
datasets: BankDSet (a real-world financial dataset) and SynFraudDataset (a synthetic fraud dataset). Various ML models,
including Systematic Detection (SyD), Random Forest (RF), Isolation Forest (IF), DBSCAN, SVM, and PCA, were tested
across these datasets. The results demonstrated that SyD achieved 100% recall, effectively detecting fraud while minimizing
false negatives—outperforming traditional models, which exhibited high false negative rates. These findings validate
SFinDSet as a reliable benchmark dataset, highlighting the critical role of synthetic financial datasets in advancing fraud
detection research.
Keywords :
Synthetic Financial Datasets, Fraud Detection, Machine Learning Models.