Authors :
Bala Velan M; Jayanth Balraj A; Jerusha Angel K; Oswalt Manoj
Volume/Issue :
Volume 10 - 2025, Issue 9 - September
Google Scholar :
https://tinyurl.com/3ypb9md2
Scribd :
https://tinyurl.com/2bkea2ur
DOI :
https://doi.org/10.38124/ijisrt/25sep880
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Note : Google Scholar may take 30 to 40 days to display the article.
Abstract :
Echo Vision is a mobile assistive navigation system that uses augmented reality (AR) and edge computing to guide
visually impaired users. We use ARCore for real-time camera pose tracking on smartphones that support it. If a device does
not support ARCore, we use ORB-SLAM3. The system constantly checks if the environment is indoor or outdoor using a
deep scene recognition model (Places365). This information helps select the correct object detection method. A YOLOv8
model is used for object detection indoors, and an SSD Mobile Net V3 detector is used outdoors for important obstacles. The
phone sends video frames to a local server over Wi-Fi, and the server returns identified objects with their locations and
distance ranges (for example, 0–1 m or 1–1.5 m). A sliding memory module on the server combines recent detections and
creates periodic voice prompts that summarize nearby obstacles. This avoids too many repeated alerts. The client application
shows visual overlays of the environment and obstacles and gives voice warnings through Android TextToSpeech, with
vibration alerts for objects closer than 1 m. Tests in hallways and outdoor paths show that EchoVision can adjust its frame
rate based on changes in the scene. This helps manage network use. This paper describes the architecture, implementation,
and performance of Echo Vision. It also explains the current features and future steps for an edge-cloud assistive platform.
Keywords :
Accurate Localization, Deep Learning, Comprehensive Route Planning, Specialized Detection, Fragmented Coverage, Indoor Navigation, Robust Mapping.
References :
- D. Dakopoulos, “Tyflos: A Wearable Navigation Prototype for Blind & Visually Impaired; Design, Modelling and Experimental Results,” Ph.D. dissertation, Dept. of Computer Science, Wright State University, Dayton, OH, 2009.
- Y. H. Lee and G. Medioni, “RGB-D Camera Based Navigation for the Visually Impaired,” Technical Report, Dept. of Electrical Engineering and Dept. of Computer Science, University of Southern California, Los Angeles, CA 90089, 2011.
- A. Ben Atitallah, Y. Said, M. A. Ben Atitallah, M. Albekairi, K. Kaaniche, and S. Boubaker, “An effective obstacle detection system using deep learning advantages to aid blind and visually impaired navigation,” Ain Shams Engineering Journal, vol. 15, no. 2, Art. no. 102387, 2024, doi: 10.1016/j.asej.2023.102387.
- G. Jocher et al., “YOLO by Ultralytics (YOLOv8),” 2023. [Online]. Available: https://github.com/ultralytics/ultralytics.
- J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Las Vegas, NV, USA, 2016, pp. 779–788, doi: 10.1109/CVPR.2016.91.
- B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba, “Places: A 10 Million Image Database for Scene Recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 6, pp. 1452–1464, Jun. 2018, doi: 10.1109/TPAMI.2017.2723009.
- C. Campos, R. Elvira, J. J. G. Rodríguez, J. M. M. Montiel, and J. D. Tardós, “ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap SLAM,” IEEE Trans. Robot., vol. 37, no. 6, pp. 1874–1890, Dec. 2021, doi: 10.1109/TRO.2021.3075644.
- Google Inc., “ARCore: Augmented Reality for Android,” [Online]. Available: https://developers.google.com/ar.
- W. Liu et al., “SSD: Single Shot MultiBox Detector,” in Computer Vision – ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds. Lecture Notes in Computer Science, vol. 9905, Springer, Cham, Sep. 2016, doi: 10.1007/978-3-319-46448-0_2.
- A. Howard et al., “Searching for MobileNetV3,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 1314–1324.
- H. Bay, T. Tuytelaars, and L. Van Gool, “SURF: Speeded Up Robust Features,” Comput. Vis. Image Underst., vol. 110, no. 3, pp. 346–359, 2008, doi: 10.1007/11744023_32.
- D. G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, Nov. 2004, doi: 10.1023/B:VISI.0000029664.99615.94.
- M. H. Abidi, A. N. Siddiquee, H. Alkhalefah, and V. Srivastava, “A Comprehensive Review of Navigation Systems for Visually Impaired Individuals,” Heliyon, Art. no. e31825, May 2024, doi: 10.1016/j.heliyon. 2024.e31825.
- M. Matei and L. Alboaie, “Enhancing Accessible Navigation: A Fusion of Speech, Gesture, and Sonification for the Visually Impaired,” Procedia Computer Science, vol. 246, pp. 2558–2567, 2024, doi: 10.1016/j.procs.2024.09.442.
- M. Aharchi and M. Ait Kbir, “Enhancing Navigation for the Visually Impaired Through Object Detection and 3D Audio Feedback,” Journal of Theoretical and Applied Information Technology, vol. 102, no. 19, Article 6966, Oct. 15, 2024.
- L. Manirajee et al., “Assistive Technology for Visually Impaired Individuals: A Systematic Literature Review (SLR),” Int. J. Acad. Res. Bus. Soc. Sci., vol. 14, no. 2, Feb. 2024, doi: 10.6007/IJARBSS/v14-i2/20827.
Echo Vision is a mobile assistive navigation system that uses augmented reality (AR) and edge computing to guide
visually impaired users. We use ARCore for real-time camera pose tracking on smartphones that support it. If a device does
not support ARCore, we use ORB-SLAM3. The system constantly checks if the environment is indoor or outdoor using a
deep scene recognition model (Places365). This information helps select the correct object detection method. A YOLOv8
model is used for object detection indoors, and an SSD Mobile Net V3 detector is used outdoors for important obstacles. The
phone sends video frames to a local server over Wi-Fi, and the server returns identified objects with their locations and
distance ranges (for example, 0–1 m or 1–1.5 m). A sliding memory module on the server combines recent detections and
creates periodic voice prompts that summarize nearby obstacles. This avoids too many repeated alerts. The client application
shows visual overlays of the environment and obstacles and gives voice warnings through Android TextToSpeech, with
vibration alerts for objects closer than 1 m. Tests in hallways and outdoor paths show that EchoVision can adjust its frame
rate based on changes in the scene. This helps manage network use. This paper describes the architecture, implementation,
and performance of Echo Vision. It also explains the current features and future steps for an edge-cloud assistive platform.
Keywords :
Accurate Localization, Deep Learning, Comprehensive Route Planning, Specialized Detection, Fragmented Coverage, Indoor Navigation, Robust Mapping.