Authors :
W. Sweta; J. Kartiki; K. Prerana; M. Aarya; P. Rutuja
Volume/Issue :
Volume 10 - 2025, Issue 6 - June
Google Scholar :
https://tinyurl.com/c9jc6bws
DOI :
https://doi.org/10.38124/ijisrt/25jun285
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Abstract :
This report presents the design and development of a Sign Language to Text and Speech Conversion System. The
main goal of this project is to improve communication for people who are deaf or hard of hearing by translating sign
language gestures into text and spoken words in real time. This helps bridge the communication gap between sign language
users and people who don’t know sign language, making daily conversations easier and more inclusive. Our system uses a
gesture recognition model based on Convolutional Neural Networks (CNNs) to accurately detect hand gestures that
represent different signs. One of the major challenges in this process is handling changing lighting conditions, various
backgrounds, and differences in hand shapes or gestures. To overcome these issues, we use vision-based techniques and
landmark detection with the help of the MediaPipe library, which enhances the accuracy and performance of the system.
After recognizing a gesture, the system converts it into text and uses Text-to-Speech (TTS) technology to generate clear
spoken output. This allows people with hearing disabilities to communicate more smoothly with those unfamiliar with sign
language, making interactions quicker and more effective. The report also discusses the positive impact this technology can
have in places like schools, offices, and public service areas. It emphasizes the importance of ongoing improvements in
machine learning and computer vision to make such systems even more reliable and user-friendly. Overall, this project
highlights how modern technology can promote a more inclusive, accessible world for everyone.
Keywords :
Sign Language to Text and Speech Conversion, Gesture Recognition, Convolutional Neural Networks (CNN), Text-to- Speech, Accessibility, Inclusivity, Computer Vision, MediaPipe, Real-Time Communication.
References :
- A. S. K. Raj and P. S. Babu, "A survey on sign language recognition system for Indian sign language using CNN," Journal of Computational and Theoretical Nanoscience, vol. 16, pp. 3983–3991, 2019.
- A. Z. Choudhury and S. P. Ghosh, "Sign language recognition using convolutional neural networks and MediaPipe for real-time applications," in Proc. IEEE Int. Conf. on Advanced Networks and Telecommunications Systems (ANTS), 2021.
- P. S. Patil, P. S. M. Sharma, and R. K. Chatterjee, "Real-time hand gesture recognition for sign language translation," in Proc. Int. Conf. on Artificial Intelligence and Computer Science (AICS), pp. 213–217, 2020.
- Google Inc., "Google Text-to-Speech (gTTS) Documentation," [Online]. Available: https://pypi.org/project/gTTS/. [Accessed: Feb. 15, 2025].
- H. J. Nguyen and M. B. Y. Chang, "Real-time sign language recognition using MediaPipe framework and deep learning," Int. J. of Computer Vision, vol. 31, no. 6, pp. 524–538, 2020.
- T. M. Soong, "Deep learning for gesture recognition in sign language communication," Journal of Artificial Intelligence in Engineering, vol. 28, pp. 215–225, 2018.
- A. Shalal, "Survey of modern techniques for real-time gesture recognition systems," IEEE Access, vol. 8, pp. 55871–55881, 2020.
- K. L. R. R. Reddy, S. S. Srinivas, and K. C. S. Prasad, "Sign language recognition using CNNs and deep learning," IEEE Access, vol. 8, pp. 117148 117160, 2020.
- A. Z. Choudhury and S. P. Ghosh, "Sign language recognition using convolutional neural networks and MediaPipe for real-time applications," IEEE Int. Conf. on Advanced Networks and Telecommunications Systems (ANTS), 2021. ADYPSOE, Department of Artificial Intelligence and Data Science 2024-25 69
- A. C. R. D. S. A. Rajasekaran, "Sign language recognition using hand gestures with MediaPipe and CNN," Int. J. of Computer Science and Network Security, vol. 21, pp. 241–245, 2021.
- P. M. B. C. E. J. Doe, "Exploring CNN-based approaches to real-time sign language recognition," Int. J. of Computer Vision, vol. 41, no. 7, pp. 891 904, 2019.
- F. H. P. Zhang and J. W. Li, "Convolutional neural networks for sign language recognition with real-time processing," IEEE Transactions on Neural Networks and Learning Systems, vol. 32, pp. 35–47, 2021.
- S. A. P. R. U. B. A. Kumar, "Real-time hand gesture recognition using MediaPipe framework and CNNs," Journal of Artificial Intelligence and Computer Vision, vol. 32, pp. 108–120, 2021.
- M. M. A. Singh and J. G. Ghosh, "Deep learning-based sign language recognition system for communication assistance," Int. J. of Applied Artificial Intelligence, vol. 30, pp. 1254–1265, 2020.
- R. Sharma and J. P. W. Wang, "Combining CNNs with MediaPipe for real-time gesture recognition in sign language," in Proc. Int. Conf. on Image Processing and Computer Vision, pp. 225–231, 2020.
- R. H. K. W. Wang, "Real-time translation of sign language to text and speech using machine learning," IEEE Transactions on Human-Machine Systems, vol. 50, no. 8, pp. 743–750, 2020.
This report presents the design and development of a Sign Language to Text and Speech Conversion System. The
main goal of this project is to improve communication for people who are deaf or hard of hearing by translating sign
language gestures into text and spoken words in real time. This helps bridge the communication gap between sign language
users and people who don’t know sign language, making daily conversations easier and more inclusive. Our system uses a
gesture recognition model based on Convolutional Neural Networks (CNNs) to accurately detect hand gestures that
represent different signs. One of the major challenges in this process is handling changing lighting conditions, various
backgrounds, and differences in hand shapes or gestures. To overcome these issues, we use vision-based techniques and
landmark detection with the help of the MediaPipe library, which enhances the accuracy and performance of the system.
After recognizing a gesture, the system converts it into text and uses Text-to-Speech (TTS) technology to generate clear
spoken output. This allows people with hearing disabilities to communicate more smoothly with those unfamiliar with sign
language, making interactions quicker and more effective. The report also discusses the positive impact this technology can
have in places like schools, offices, and public service areas. It emphasizes the importance of ongoing improvements in
machine learning and computer vision to make such systems even more reliable and user-friendly. Overall, this project
highlights how modern technology can promote a more inclusive, accessible world for everyone.
Keywords :
Sign Language to Text and Speech Conversion, Gesture Recognition, Convolutional Neural Networks (CNN), Text-to- Speech, Accessibility, Inclusivity, Computer Vision, MediaPipe, Real-Time Communication.