Authors :
L Chingwaru; S Chaputsira; M Mutandavari
Volume/Issue :
Volume 9 - 2024, Issue 8 - August
Google Scholar :
https://tinyurl.com/yc5sjbkm
DOI :
https://doi.org/10.38124/ijisrt/24aug1034
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Note : Google Scholar may take 30 to 40 days to display the article.
Abstract :
In an era of rapidly growing data, effective data lifecycle management has become crucial for organizations. This
paper addresses the challenge of identifying and classifying data columns as either demographic or transactional across
various systems, where column names may differ significantly (e.g., "Sex" in one system and "Gender" in another). The
purpose of this research is to develop a model that can accurately classify these data columns, enabling automated data
retention and destruction processes. The proposed model leverages intelligent process automation and process mining to
identify and categorize data, allowing transactional data to be archived automatically after a specified timeframe. By
implementing this model, organizations can improve their data management efficiency, ensuring compliance with data
retention policies while optimizing storage use.
Keywords :
Data Lifecycle Management, Column Classification, Intelligent Process Automation, Process Mining, Data Retention.
References :
- Aggarwal, C. C. (2014). Data Classification: Algorithms and Applications. CRC Press. ISBN: 978-1466583284.
- Bose, R. P. J. C., & van der Aalst, W. M. P. (2015). Process Mining in Healthcare: Data Challenges when Answering Frequently Posed Questions. In Proceedings of the 13th Conference on Business Process Management. IEEE.
- Ceravolo, P., et al. (2018). Big Data Semantics for Data Quality Management in Data-Intensive Processes. Journal of Data and Information Quality, 9(1), 1-24.
- Provost, F., & Fawcett, T. (2013). Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking. O'Reilly Media. ISBN: 978-1449361327.
- Reis, R. S. (2019). A Review of Data Retention Policies and Practices. Journal of Information Security, 10(3), 101-115.
- Sicular, S. (2019). How to Get Started with Data Lifecycle Management. Gartner Research. Retrieved from
- van der Aalst, W. M. P. (2016). Process Mining: Data Science in Action. Springer. ISBN: 978- 3662509357.
- Zhang, Y., & Xu, H. (2020). Applying Machine Learning for Data Classification in Business Applications. In Proceedings of the 2020 International Conference on Data Science and Analytics (pp. 89-96). IEEE.
- Lee, S., & Zhang, J. (2021). Automating Data Retention and Deletion with Intelligent Process Automation. International Journal of Data Management, 57, 222-240.
- Johnson, L. C., & Nguyen, T. P. (2021). Automating Data Lifecycle Management through Intelligent Process Mining. International Journal of Information Management, 58, 102311
- Aggarwal, C. C. (2014). Data Classification: Algorithms and Applications. CRC Press. ISBN: 978-1466583284.
- Chen, M., Mao, S., & Liu, Y. (2014). Big Data: A Survey. Mobile Networks and Applications, 19(2), 171-209
- Gandomi, A., & Haider, M. (2015). Beyond the Hype: Big Data Concepts, Methods, and Analytics. International Journal of Information Management, 35(2), 137-144.
- Schüller, D. (2020). Process Mining and Data Science: Bridging the Gap. Data Science Journal, 19, 1-11
- Porter, M. E., & Heppelmann, J. E. (2015). How Smart, Connected Products Are Transforming Companies. Harvard Business Review, 93(10), 96-114.
- Chen, H., Chiang, R. H. L., & Storey, V. C. (2012). Business Intelligence and Analytics: From Big Data to Big Impact. MIS Quarterly, 36(4), 1165- 1188
- Meyer, G., & Wiseman, S. (2017). Managing Data as a Strategic Asset: The Five Pillars of Effective Data Governance. Journal of Data Governance, 3(4), 22-28.
- Wilkinson, M. D., et al. (2016). The FAIR Guiding Principles for Scientific Data Management and Stewardship. Scientific Data, 3(1), 160018.
In an era of rapidly growing data, effective data lifecycle management has become crucial for organizations. This
paper addresses the challenge of identifying and classifying data columns as either demographic or transactional across
various systems, where column names may differ significantly (e.g., "Sex" in one system and "Gender" in another). The
purpose of this research is to develop a model that can accurately classify these data columns, enabling automated data
retention and destruction processes. The proposed model leverages intelligent process automation and process mining to
identify and categorize data, allowing transactional data to be archived automatically after a specified timeframe. By
implementing this model, organizations can improve their data management efficiency, ensuring compliance with data
retention policies while optimizing storage use.
Keywords :
Data Lifecycle Management, Column Classification, Intelligent Process Automation, Process Mining, Data Retention.