Authors :
Dr. N. Dhivya; M. Subalakshmi
Volume/Issue :
Volume 11 - 2026, Issue 4 - April
Google Scholar :
https://tinyurl.com/3nhe9ztv
Scribd :
https://tinyurl.com/884mz5fc
DOI :
https://doi.org/10.38124/ijisrt/26apr1803
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Abstract :
The insurance industry increasingly relies on data-driven technologies to enhance pricing strategies and risk
assessment. This study presents a machine learning-based approach to predict insurance charges using customer
demographic and health-related attributes. The dataset consists of features such as age, gender, body mass index (BMI),
number of children, smoking status, and region. Data preprocessing techniques, including data cleaning, categorical
encoding, and normalization, were applied to improve model performance. A Linear Regression algorithm was
implemented to develop the prediction model. The model was evaluated using performance metrics such as Mean Absolute
Error (MAE), Root Mean Squared Error (RMSE), and R-squared (R²) score. The proposed model achieved an R² score of
0.85 (85% accuracy), indicating a strong relationship between predicted and actual insurance charges. The results
demonstrate that machine learning techniques can effectively model complex relationships in insurance data and provide
reliable predictions. This system can assist insurance companies in making accurate, data-driven pricing decisions.
Keywords :
Machine Learning, Insurance Prediction, Linear Regression, Data Science, Data Preprocessing, Predictive Analytics.
References :
- Brati, E., Braimllari, A., & Gjeçi, A. Machine Learning Applications for Predicting High-Cost Claims Using Insurance Data. MDPI Data Journal, 2025.
- Kulkarni, M., et al. Medical Insurance Cost Prediction Using Machine Learning. IJRASET Journal.
- AbdElminaam, D., et al. An Efficient Framework for Predicting Medical Insurance Prices Using Machine Learning.
- Zanke, P., Raparthi, M. Predictive Modelling for Insurance Pricing Using Machine Learning.
- Kshirsagar, R., et al. Machine Learning Regression Framework for Predicting Health Insurance Premiums.
- T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016.
- L. Breiman, “Random Forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001.
- J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd ed. Morgan Kaufmann, 2011.
- I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press, 2016.
- P. Domingos, “A Few Useful Things to Know About Machine Learning,” Communications of the ACM, vol. 55, no. 10, pp. 78–87, 2012.
- S. B. Kotsiantis, “Supervised Machine Learning: A Review of Classification Techniques,” Informatica, vol. 31, pp. 249–268, 2007.
- C. M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006.
- J. Friedman, T. Hastie, and R. Tibshirani, The Elements of Statistical Learning, Springer, 2009.
- D. Dua and C. Graff, “UCI Machine Learning Repository,” University of California, Irvine, 2017.
- A. Géron, Hands-On Machine Learning with Scikit-Learn and TensorFlow, O’Reilly Media, 2017.
- W. McKinney, “Data Structures for Statistical Computing in Python,” Proceedings of the Python in Science Conference, 2010.
- J. D. Hunter, “Matplotlib: A 2D Graphics Environment,” Computing in Science & Engineering, vol. 9, no. 3, pp. 90–95, 2007.
- F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
- T. Mitchell, Machine Learning, McGraw-Hill, 1997.
- S. Raschka and V. Mirjalili, Python Machine Learning, Packt Publishing, 2017.
- J. Brownlee, Machine Learning Mastery with Python, Machine Learning Mastery, 2017.
- V. Vapnik, The Nature of Statistical Learning Theory, Springer, 1995.
- R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, 2018.
- F. Chollet, Deep Learning with Python, Manning Publications, 2017.
- E. Alpaydin, Introduction to Machine Learning, MIT Press, 2014.
- S. Haykin, Neural Networks and Learning Machines, Pearson Education, 2009.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- R. B. Myerson, “Game Theory and Insurance Decisions,” Journal of Economic Perspectives, vol. 7, no. 2, pp. 43–64, 1993.
- S. Shalev-Shwartz and S. Ben-David, Understanding Machine Learning: From Theory to Algorithms, Cambridge University Press, 2014.
- OECD, “The Role of Big Data in Insurance,” OECD Publishing, 2020.
- D. Hand, H. Mannila, and P. Smyth, Principles of Data Mining, MIT Press, 2001.
- G. James, D. Witten, T. Hastie, and R. Tibshirani, An Introduction to Statistical Learning, Springer, 2013.
- J. Leskovec, A. Rajaraman, and J. Ullman, Mining of Massive Datasets, Cambridge University Press, 2014.
- M. Kuhn and K. Johnson, Applied Predictive Modeling, Springer, 2013.
- T. Hastie, R. Tibshirani, and J. Friedman, Statistical Learning with Sparsity, CRC Press, 2015.
- A. Géron, Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd ed., O’Reilly Media, 2019.
- J. VanderPlas, Python Data Science Handbook, O’Reilly Media, 2016.
- R. Elmasri and S. Navathe, Fundamentals of Database Systems, Pearson, 2016.
- S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach, Pearson, 2010.
- C. Aggarwal, Data Mining: The Textbook, Springer, 2015.
The insurance industry increasingly relies on data-driven technologies to enhance pricing strategies and risk
assessment. This study presents a machine learning-based approach to predict insurance charges using customer
demographic and health-related attributes. The dataset consists of features such as age, gender, body mass index (BMI),
number of children, smoking status, and region. Data preprocessing techniques, including data cleaning, categorical
encoding, and normalization, were applied to improve model performance. A Linear Regression algorithm was
implemented to develop the prediction model. The model was evaluated using performance metrics such as Mean Absolute
Error (MAE), Root Mean Squared Error (RMSE), and R-squared (R²) score. The proposed model achieved an R² score of
0.85 (85% accuracy), indicating a strong relationship between predicted and actual insurance charges. The results
demonstrate that machine learning techniques can effectively model complex relationships in insurance data and provide
reliable predictions. This system can assist insurance companies in making accurate, data-driven pricing decisions.
Keywords :
Machine Learning, Insurance Prediction, Linear Regression, Data Science, Data Preprocessing, Predictive Analytics.