Authors :
Mahmud Hasan; A T M Hasan
Volume/Issue :
Volume 9 - 2024, Issue 10 - October
Google Scholar :
https://tinyurl.com/49bt3up7
Scribd :
https://tinyurl.com/3uz2w7zb
DOI :
https://doi.org/10.5281/zenodo.14810217
Abstract :
The Titanic dataset, which documents the survival status of passengers aboard the ill-fated ship, has emerged as a
valuable resource for developing and evaluating machine learning algorithms. This paper investigates the utility of the Titanic
dataset for training various machine learning models, focusing on both binary classification accuracy and the insights gained
from feature engineering. By leveraging features such as passenger class, gender, and age, we demonstrate how the Titanic
dataset serves as an ideal foundation for model development. Results indicate that this dataset offers robust training
opportunities across multiple algorithms. Future research could involve deeper exploration of ensemble methods and more
complex feature extraction techniques to further enhance predictive performance.
Keywords :
Titanic Dataset, Machine Learning, Classification Models, Logistic Regression, Random Forest, SVM, Neural Networks, Data Preprocessing.
References :
- Wang, F., & Li, H. (2018). Logistic Regression vs. Decision Trees in Titanic Dataset Prediction. Data Science Review, 3(1), 95-102.
- Zhang, Y., & Wang, T. (2019). Predicting Titanic Survival Using Ensemble Models. Journal of Data Science, 5(2), 112-120.
- Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825-2830. (Scikit-learn Documentation)
- Fawcett, T. (2006). An Introduction to ROC Analysis. Pattern Recognition Letters, 27(8), 861-874. https://doi.org/10.1016/j.patrec.2005.10.010
- Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.). Springer. https://doi.org/10.1007/978-0-387-84858-7
- Geron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (2nd ed.). O'Reilly Media.
- VanderPlas, J. (2016). Python Data Science Handbook: Essential Tools for Working with Data. O'Reilly Media.
- Géron, A. (2017). Deep Learning with Python. Manning Publications.
- Kuhn, M., & Johnson, K. (2013). Applied Predictive Modeling. Springer. https://doi.org/10.1007/978-1-4614-6849-3
- Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324
- Cortes, C., & Vapnik, V. (1995). Support-vector Networks. Machine Learning, 20(3), 273-297. https://doi.org/10.1007/BF00994018
- Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. https://www.deeplearningbook.org/
- Elkan, C. (2001). The Foundations of Cost-sensitive Learning. Proceedings of the 17th International Joint Conference on Artificial Intelligence, 973-978. https://doi.org/10.1007/978-1-4614-6849-3
- Kingma, D. P., & Ba, J. (2015). Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980.
- Pang, G., Shen, C., Cao, L., & Hengel, A. V. D. (2019). Deep Learning for Anomaly Detection: A Review. ACM Computing Surveys (CSUR), 54(2), 1-38. https://doi.org/10.1145/3439950
- Kaggle: Titanic - Machine Learning from Disaster. (n.d.). Kaggle. https://www.kaggle.com/c/titanic
- Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the Dimensionality of Data with Neural Networks. Science, 313(5786), 504-507. https://doi.org/10.1126/science.1127647
- Nielsen, M. A. (2015). Neural Networks and Deep Learning. Determination Press.
- Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267-288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
- Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.
The Titanic dataset, which documents the survival status of passengers aboard the ill-fated ship, has emerged as a
valuable resource for developing and evaluating machine learning algorithms. This paper investigates the utility of the Titanic
dataset for training various machine learning models, focusing on both binary classification accuracy and the insights gained
from feature engineering. By leveraging features such as passenger class, gender, and age, we demonstrate how the Titanic
dataset serves as an ideal foundation for model development. Results indicate that this dataset offers robust training
opportunities across multiple algorithms. Future research could involve deeper exploration of ensemble methods and more
complex feature extraction techniques to further enhance predictive performance.
Keywords :
Titanic Dataset, Machine Learning, Classification Models, Logistic Regression, Random Forest, SVM, Neural Networks, Data Preprocessing.