Authors :
Samir Pandey; Ami Shah
Volume/Issue :
Volume 10 - 2025, Issue 3 - March
Google Scholar :
https://tinyurl.com/yebzy7sz
Scribd :
https://tinyurl.com/584xyn8a
DOI :
https://doi.org/10.38124/ijisrt/25mar1342
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Abstract :
In the era of big data, high-quality data is essential for accurate analysis and decision-making. This paper explores
the process of data cleaning and preparation for advanced analytics, focusing on techniques such as handling missing values,
outlier detection, data transformation, and feature engineering. A case study is presented using a dataset to perform time
series analysis, cohort segmentation, churn analysis, and customer segmentation. The goal is to enhance data reliability and
usability for machine learning and predictive modeling.
Keywords :
Data Cleaning, Data Preparation, Time Series Analysis, Cohort Segmentation, Churn Analysis, Outlier Detection, Feature Engineering.
References :
- Wes McKinney, "Python for Data Analysis," O'Reilly Media, 2017.
- Hastie, T., Tibshirani, R., & Friedman, J., "The Elements of Statistical Learning," Springer, 2009.
- J. Han, M. Kamber, & J. Pei, "Data Mining: Concepts and Techniques," Morgan Kaufmann, 2011.
- Kaggle Datasets, https://www.kaggle.com/
- Prophet Forecasting Model, https://facebook.github.io/prophet/
In the era of big data, high-quality data is essential for accurate analysis and decision-making. This paper explores
the process of data cleaning and preparation for advanced analytics, focusing on techniques such as handling missing values,
outlier detection, data transformation, and feature engineering. A case study is presented using a dataset to perform time
series analysis, cohort segmentation, churn analysis, and customer segmentation. The goal is to enhance data reliability and
usability for machine learning and predictive modeling.
Keywords :
Data Cleaning, Data Preparation, Time Series Analysis, Cohort Segmentation, Churn Analysis, Outlier Detection, Feature Engineering.