Using Titanic Dataset for Comprehensive Machine Learning Model Training


Authors : Mahmud Hasan; A T M Hasan

Volume/Issue : Volume 9 - 2024, Issue 10 - October


Google Scholar : https://tinyurl.com/49bt3up7

Scribd : https://tinyurl.com/3uz2w7zb

DOI : https://doi.org/10.5281/zenodo.14810217


Abstract : The Titanic dataset, which documents the survival status of passengers aboard the ill-fated ship, has emerged as a valuable resource for developing and evaluating machine learning algorithms. This paper investigates the utility of the Titanic dataset for training various machine learning models, focusing on both binary classification accuracy and the insights gained from feature engineering. By leveraging features such as passenger class, gender, and age, we demonstrate how the Titanic dataset serves as an ideal foundation for model development. Results indicate that this dataset offers robust training opportunities across multiple algorithms. Future research could involve deeper exploration of ensemble methods and more complex feature extraction techniques to further enhance predictive performance.

Keywords : Titanic Dataset, Machine Learning, Classification Models, Logistic Regression, Random Forest, SVM, Neural Networks, Data Preprocessing.

References :

  1. Wang, F., & Li, H. (2018). Logistic Regression vs. Decision Trees in Titanic Dataset Prediction. Data Science Review, 3(1), 95-102.
  2. Zhang, Y., & Wang, T. (2019). Predicting Titanic Survival Using Ensemble Models. Journal of Data Science, 5(2), 112-120.
  3. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825-2830. (Scikit-learn Documentation)
  4. Fawcett, T. (2006). An Introduction to ROC Analysis. Pattern Recognition Letters, 27(8), 861-874. https://doi.org/10.1016/j.patrec.2005.10.010
  5. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.). Springer. https://doi.org/10.1007/978-0-387-84858-7
  6. Geron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (2nd ed.). O'Reilly Media.
  7. VanderPlas, J. (2016). Python Data Science Handbook: Essential Tools for Working with Data. O'Reilly Media.
  8. Géron, A. (2017). Deep Learning with Python. Manning Publications.
  9. Kuhn, M., & Johnson, K. (2013). Applied Predictive Modeling. Springer. https://doi.org/10.1007/978-1-4614-6849-3
  10. Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324
  11. Cortes, C., & Vapnik, V. (1995). Support-vector Networks. Machine Learning, 20(3), 273-297. https://doi.org/10.1007/BF00994018
  12. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. https://www.deeplearningbook.org/
  13. Elkan, C. (2001). The Foundations of Cost-sensitive Learning. Proceedings of the 17th International Joint Conference on Artificial Intelligence, 973-978. https://doi.org/10.1007/978-1-4614-6849-3
  14. Kingma, D. P., & Ba, J. (2015). Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980.
  15. Pang, G., Shen, C., Cao, L., & Hengel, A. V. D. (2019). Deep Learning for Anomaly Detection: A Review. ACM Computing Surveys (CSUR), 54(2), 1-38. https://doi.org/10.1145/3439950
  16. Kaggle: Titanic - Machine Learning from Disaster. (n.d.). Kaggle. https://www.kaggle.com/c/titanic
  17. Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the Dimensionality of Data with Neural Networks. Science, 313(5786), 504-507. https://doi.org/10.1126/science.1127647
  18. Nielsen, M. A. (2015). Neural Networks and Deep Learning. Determination Press.
  19. Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267-288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
  20. Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.

The Titanic dataset, which documents the survival status of passengers aboard the ill-fated ship, has emerged as a valuable resource for developing and evaluating machine learning algorithms. This paper investigates the utility of the Titanic dataset for training various machine learning models, focusing on both binary classification accuracy and the insights gained from feature engineering. By leveraging features such as passenger class, gender, and age, we demonstrate how the Titanic dataset serves as an ideal foundation for model development. Results indicate that this dataset offers robust training opportunities across multiple algorithms. Future research could involve deeper exploration of ensemble methods and more complex feature extraction techniques to further enhance predictive performance.

Keywords : Titanic Dataset, Machine Learning, Classification Models, Logistic Regression, Random Forest, SVM, Neural Networks, Data Preprocessing.

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe