Echo Vision: AI Voice Assisted Navigation for Visually Impaired


Authors : Bala Velan M; Jayanth Balraj A; Jerusha Angel K; Oswalt Manoj

Volume/Issue : Volume 10 - 2025, Issue 9 - September


Google Scholar : https://tinyurl.com/3ypb9md2

Scribd : https://tinyurl.com/2bkea2ur

DOI : https://doi.org/10.38124/ijisrt/25sep880

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.

Note : Google Scholar may take 30 to 40 days to display the article.


Abstract : Echo Vision is a mobile assistive navigation system that uses augmented reality (AR) and edge computing to guide visually impaired users. We use ARCore for real-time camera pose tracking on smartphones that support it. If a device does not support ARCore, we use ORB-SLAM3. The system constantly checks if the environment is indoor or outdoor using a deep scene recognition model (Places365). This information helps select the correct object detection method. A YOLOv8 model is used for object detection indoors, and an SSD Mobile Net V3 detector is used outdoors for important obstacles. The phone sends video frames to a local server over Wi-Fi, and the server returns identified objects with their locations and distance ranges (for example, 0–1 m or 1–1.5 m). A sliding memory module on the server combines recent detections and creates periodic voice prompts that summarize nearby obstacles. This avoids too many repeated alerts. The client application shows visual overlays of the environment and obstacles and gives voice warnings through Android TextToSpeech, with vibration alerts for objects closer than 1 m. Tests in hallways and outdoor paths show that EchoVision can adjust its frame rate based on changes in the scene. This helps manage network use. This paper describes the architecture, implementation, and performance of Echo Vision. It also explains the current features and future steps for an edge-cloud assistive platform.

Keywords : Accurate Localization, Deep Learning, Comprehensive Route Planning, Specialized Detection, Fragmented Coverage, Indoor Navigation, Robust Mapping.

References :

  1. D. Dakopoulos, “Tyflos: A Wearable Navigation Prototype for Blind & Visually Impaired; Design, Modelling and Experimental Results,” Ph.D. dissertation, Dept. of Computer Science, Wright State University, Dayton, OH, 2009.
  2. Y. H. Lee and G. Medioni, “RGB-D Camera Based Navigation for the Visually Impaired,” Technical Report, Dept. of Electrical Engineering and Dept. of Computer Science, University of Southern California, Los Angeles, CA 90089, 2011.
  3. A. Ben Atitallah, Y. Said, M. A. Ben Atitallah, M. Albekairi, K. Kaaniche, and S. Boubaker, “An effective obstacle detection system using deep learning advantages to aid blind and visually impaired navigation,” Ain Shams Engineering Journal, vol. 15, no. 2, Art. no. 102387, 2024, doi: 10.1016/j.asej.2023.102387.
  4. G. Jocher et al., “YOLO by Ultralytics (YOLOv8),” 2023. [Online]. Available: https://github.com/ultralytics/ultralytics.
  5. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Las Vegas, NV, USA, 2016, pp. 779–788, doi: 10.1109/CVPR.2016.91.
  6. B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba, “Places: A 10 Million Image Database for Scene Recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 6, pp. 1452–1464, Jun. 2018, doi: 10.1109/TPAMI.2017.2723009.
  7. C. Campos, R. Elvira, J. J. G. Rodríguez, J. M. M. Montiel, and J. D. Tardós, “ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap SLAM,” IEEE Trans. Robot., vol. 37, no. 6, pp. 1874–1890, Dec. 2021, doi: 10.1109/TRO.2021.3075644.
  8. Google Inc., “ARCore: Augmented Reality for Android,” [Online]. Available: https://developers.google.com/ar.
  9. W. Liu et al., “SSD: Single Shot MultiBox Detector,” in Computer Vision – ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds. Lecture Notes in Computer Science, vol. 9905, Springer, Cham, Sep. 2016, doi: 10.1007/978-3-319-46448-0_2.
  10. A. Howard et al., “Searching for MobileNetV3,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 1314–1324.
  11. H. Bay, T. Tuytelaars, and L. Van Gool, “SURF: Speeded Up Robust Features,” Comput. Vis. Image Underst., vol. 110, no. 3, pp. 346–359, 2008, doi: 10.1007/11744023_32.
  12. D. G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, Nov. 2004, doi: 10.1023/B:VISI.0000029664.99615.94.
  13. M. H. Abidi, A. N. Siddiquee, H. Alkhalefah, and V. Srivastava, “A Comprehensive Review of Navigation Systems for Visually Impaired Individuals,” Heliyon, Art. no. e31825, May 2024, doi: 10.1016/j.heliyon. 2024.e31825.
  14. M. Matei and L. Alboaie, “Enhancing Accessible Navigation: A Fusion of Speech, Gesture, and Sonification for the Visually Impaired,” Procedia Computer Science, vol. 246, pp. 2558–2567, 2024, doi: 10.1016/j.procs.2024.09.442.
  15. M. Aharchi and M. Ait Kbir, “Enhancing Navigation for the Visually Impaired Through Object Detection and 3D Audio Feedback,” Journal of Theoretical and Applied Information Technology, vol. 102, no. 19, Article 6966, Oct. 15, 2024.
  16. L. Manirajee et al., “Assistive Technology for Visually Impaired Individuals: A Systematic Literature Review (SLR),” Int. J. Acad. Res. Bus. Soc. Sci., vol. 14, no. 2, Feb. 2024, doi: 10.6007/IJARBSS/v14-i2/20827.

Echo Vision is a mobile assistive navigation system that uses augmented reality (AR) and edge computing to guide visually impaired users. We use ARCore for real-time camera pose tracking on smartphones that support it. If a device does not support ARCore, we use ORB-SLAM3. The system constantly checks if the environment is indoor or outdoor using a deep scene recognition model (Places365). This information helps select the correct object detection method. A YOLOv8 model is used for object detection indoors, and an SSD Mobile Net V3 detector is used outdoors for important obstacles. The phone sends video frames to a local server over Wi-Fi, and the server returns identified objects with their locations and distance ranges (for example, 0–1 m or 1–1.5 m). A sliding memory module on the server combines recent detections and creates periodic voice prompts that summarize nearby obstacles. This avoids too many repeated alerts. The client application shows visual overlays of the environment and obstacles and gives voice warnings through Android TextToSpeech, with vibration alerts for objects closer than 1 m. Tests in hallways and outdoor paths show that EchoVision can adjust its frame rate based on changes in the scene. This helps manage network use. This paper describes the architecture, implementation, and performance of Echo Vision. It also explains the current features and future steps for an edge-cloud assistive platform.

Keywords : Accurate Localization, Deep Learning, Comprehensive Route Planning, Specialized Detection, Fragmented Coverage, Indoor Navigation, Robust Mapping.

CALL FOR PAPERS


Paper Submission Last Date
31 - December - 2025

Video Explanation for Published paper

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe