Authors :
Krishna Prisad Bajgai; Dr. Bhoj Raj Ghimire
Volume/Issue :
Volume 10 - 2025, Issue 2 - February
Google Scholar :
https://tinyurl.com/2frp7nuj
Scribd :
https://tinyurl.com/5n8y3du9
DOI :
https://doi.org/10.5281/zenodo.14921215
Abstract :
Organizations face challenges in managing diverse, large-scale datasets while ensuring scalability, efficiency, and
quality. Traditional data lakes and warehouses often fall short in modern big data environments. The Lakehouse architecture
unit both, but cloud implementation faces issues like optimized ingestion, efficient storage, and integration of multiple data
engines. Sectors like healthcare and agriculture struggle with real-time data and IoT, leading to inefficiencies. Current
research highlights gaps in performance, scalability, and the integration of advanced analytics. Future work should focus on
improving large dataset handling, real-time processing, and machine learning integration for better decision-making and
performance.
Keywords :
Lakehouse Architecture, Big Data Integration, Cloud Computing, Real-Time Data Processing, Federated Governance, Machine Learning.
References :
- Nuthalapati, A. (2024). Architecting data lake-houses in the cloud: Best practices and future directions. 32 citations.
- Schneider, J., Gröger, C., Lutsch, A., Schwarz, H., & Mitschang, B. (2024). The Lakehouse: State of the Art on Concepts and Technologies. 119 citations.
- Azeroual, O., & Nacheva, R. (2023). Data Mesh for Managing Complex Big Data Landscapes and Enhancing Decision Making in Organizations. 31 citations.
- Dolhopolov, A., Castelltort, A., & Laurent, A. (2024). Implementing Federated Governance in Data Mesh Architecture. 40 citations.
- Debauche, O., Mahmoudi, S., Manneback, P., & Lebeau, F. (2021). Cloud and Distributed Architectures for Data Management in Agriculture 4.0: Review and Future Trends. 55 citations.
- Li, X., Zeng, W., Wang, Z., Zhu, D., Xu, J., Yu, W., & Zhou, J. (2024). GraphAr: An Efficient Storage Scheme for Graph Data in Data Lakes. 77 citations.
- Tilala, M., Pamulaparthyvenkata, S., Chawda, A. D., & Benke, A. P. (2022). Explore the Technologies and Architectures Enabling Real-Time Data Processing Within Healthcare Data Lakes. 30 citations.
- Armbrust, M., Ghodsi, A., Xin, R., & Zaharia, M. (2021). Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics. 53 citations.
- Mazumdar, D., Hughes, J., & Onofré, J. B. (2023). The Data Lakehouse: Data Warehousing and More. 31 citations.
- Harby, A., & Zulkernine, F. (2022). From Data Warehouse to Lakehouse: A Comparative Review. 31 citations.
- Alotaibi, R., Tian, Y., Grafberger, S., Camacho-Rodríguez, J., Bruno, N., Kroth, B., et al. (2024). Towards Query Optimizer as a Service (QOaaS) in a Unified LakeHouse Ecosystem. 41 citations.
- Rucco, C., Longo, A., & Saad, M. (2024). Optimizing Data Ingestion for Big Data: A Cloud-Based Design Pattern Approach. 32 citations.
- Behm, A., Palkar, S., Agarwal, U., Armstrong, T., Cashman, D., Dave, A., et al. (2022). Photon: A Fast Query Engine for Lakehouse Systems. 59 citations.
- Azeroual, O., Schöpfel, J., Ivanovic, D., & Nikiforova, A. (2022). Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS. 35 citations.
Organizations face challenges in managing diverse, large-scale datasets while ensuring scalability, efficiency, and
quality. Traditional data lakes and warehouses often fall short in modern big data environments. The Lakehouse architecture
unit both, but cloud implementation faces issues like optimized ingestion, efficient storage, and integration of multiple data
engines. Sectors like healthcare and agriculture struggle with real-time data and IoT, leading to inefficiencies. Current
research highlights gaps in performance, scalability, and the integration of advanced analytics. Future work should focus on
improving large dataset handling, real-time processing, and machine learning integration for better decision-making and
performance.
Keywords :
Lakehouse Architecture, Big Data Integration, Cloud Computing, Real-Time Data Processing, Federated Governance, Machine Learning.