Deep learning based monocular depth estimation for object distance inference in 2d images| International Journal of Innovative Science and Research Technology

Deep Learning Based Monocular Depth Estimation for Object Distance Inference in 2D Images

Authors : G. Victor Daniel; Koneru Gnana Shritej; Kosari Hemanth Sai; Sunkara Namith

Volume/Issue : Volume 9 - 2024, Issue 4 - April

Google Scholar : https://tinyurl.com/3avaem9f

Scribd : https://tinyurl.com/4apty5h4

DOI : https://doi.org/10.38124/ijisrt/IJISRT24APR1431

PlumX Metrics

Semantic Scholar

ResearchGate

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.

Abstract : Monocular depth estimation, a process of predicting depth from a single 2D image, has seen significant advancements due to the proliferation of deep learning techniques. This research focuses on leveraging deep learning for monocular depth estimation to infer object distances accurately in 2D images. We explore various convolutional neural network (CNN) architectures and transformer models to analyze their efficacy in predicting depth information. Our approach involves training these models on extensive datasets annotated with depth information, followed by rigorous evaluation using standard metrics. The results demonstrate substantial improvements in depth estimation accuracy, highlighting the potential of deep learning in enhancing computer vision tasks such as autonomous driving, augmented reality, and robotic navigation. This study not only underscores the importance of model architecture but also investigates the impact of training data diversity and augmentation strategies. The findings provide a comprehensive understanding of the current state-of-the-art in monocular depth estimation, paving the way for future innovations in object distance inference from 2D images. By providing a detailed analysis of various models and their performance, this research contributes to a better understanding of monocular depth estimation and its potential for real-world applications, paving the way for future advancements in object distance inference from 2D images.

Keywords : Monocular Depth Estimation, Deep Learning, Convolutional Neural Network (CNN), Computer Vision, Augmented Reality, Robotic Navigation.

References :

Masoumian, Armin., Rashwan, Hatem A.., Cristiano, Julián., Asif, M. Salman., & Puig, D.. (2022). Monocular Depth Estimation Using Deep Learning: A Review. Sensors (Basel, Switzerland), 22. http://doi.org/10.3390/s22145353
Höllein, Lukas., Cao, Ang., Owens, Andrew., Johnson, Justin., & Nießner, M.. (2023). Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), 7875-7886. http://doi.org/10.1109/ICCV51070.2023.00727
Wang, Tai., Pang, Jiangmiao., & Lin, Dahua. (2022). Monocular 3D Object Detection with Depth from Motion. ArXiv, abs/2207.12988. http://doi.org/ 10.48550/arXiv.2207.12988
Lian, Qing., Li, Peiliang., & Chen, Xiaozhi. (2022). MonoJSG: Joint Semantic and Geometric Cost Volume for Monocular 3D Object Detection. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 1060-1069. http://doi.org/10.1109/CVPR52688.2022.00114
Sharma, Vijeta., Gupta, Manjari., Pandey, A.., Mishra, Deepti., & Kumar, Ajai. (2022). A Review of Deep Learning-based Human Activity Recognition on Benchmark Video Datasets. Applied Artificial Intelligence, 36. http://doi.org/10.1080/08839514. 2022.2093705
Samant, R.., Bachute, M.., Gite, Shilpa., & Kotecha, K.. (2022). Framework for Deep Learning-Based Language Models Using Multi-Task Learning in Natural Language Understanding: A Systematic Literature Review and Future Directions. IEEE Access, 10, 17078-17097. http://doi.org/10.1109/ ACCESS.2022.3149798
Chen, Mansheng., Lin, Jia-Qi., Li, Xiang-Long., Liu, Bao-Yu., Wang, Changdong., Huang, Dong., & Lai, J.. (2022). Representation Learning in Multi-view Clustering: A Literature Review. Data Science and Engineering, 7, 225-241. http://doi.org/10.1007/s 41019-022-00190-8

Monocular depth estimation, a process of predicting depth from a single 2D image, has seen significant advancements due to the proliferation of deep learning techniques. This research focuses on leveraging deep learning for monocular depth estimation to infer object distances accurately in 2D images. We explore various convolutional neural network (CNN) architectures and transformer models to analyze their efficacy in predicting depth information. Our approach involves training these models on extensive datasets annotated with depth information, followed by rigorous evaluation using standard metrics. The results demonstrate substantial improvements in depth estimation accuracy, highlighting the potential of deep learning in enhancing computer vision tasks such as autonomous driving, augmented reality, and robotic navigation. This study not only underscores the importance of model architecture but also investigates the impact of training data diversity and augmentation strategies. The findings provide a comprehensive understanding of the current state-of-the-art in monocular depth estimation, paving the way for future innovations in object distance inference from 2D images. By providing a detailed analysis of various models and their performance, this research contributes to a better understanding of monocular depth estimation and its potential for real-world applications, paving the way for future advancements in object distance inference from 2D images.

Keywords : Monocular Depth Estimation, Deep Learning, Convolutional Neural Network (CNN), Computer Vision, Augmented Reality, Robotic Navigation.