Artificial intelligence powered voice to text and text to speech recognition model a powerful tool for student comprehension of tutor speech| International Journal of Innovative Science and Research Technology

Artificial Intelligence Powered Voice to Text and Text to Speech Recognition Model – A Powerful Tool for Student Comprehension of Tutor Speech

Authors : Sonali Padhi; Kranthi Kiran; Ambica Thakur; Adityaveer Dhillon; Bharani Kumar Depuru

Volume/Issue : Volume 9 - 2024, Issue 3 - March

Google Scholar : https://tinyurl.com/3fsjfe32

Scribd : https://tinyurl.com/7smh6vjk

DOI : https://doi.org/10.38124/ijisrt/IJISRT24MAR1984

PlumX Metrics

Semantic Scholar

ResearchGate

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.

Abstract : Speech-to-Text and Text-to-Speech are both NLP(natural language processing) powered models which transform speech to text and vice versa, providing an increased scope of learning for the parties involved. For the past couple of years it's been observed that students have been moving abroad for quality education and better financial aid. Since there is an accent gap between students and tutors which reduces the understanding of students. Our work is done to solve the aforementioned problem. With its state-of-the-art STT(speech-to-text) and TTS(text-to-speech) softwares this work intends to ease the learning curve of the students. The key targets of this work are international students, individuals with disabilities. It can also be used to transcribe meetings for quick conversion of meeting discussion points into text. Companies can also use the model to get the data for the call recordings and further perform sentiment analysis and various such activities. This research aims to give a detailed walk through of the product as it stands, and provide details regarding all aspects of the product. This covers the various tech stacks used, the implementation of the said technologies, the reports shown to the different end users. This provides the workflow of the product.

Keywords : Artificial Intelligence, Deep Learning, Large Language Models, Automatic Speech Recognition, Transcription, Whisper AI, gTTs.

Speech-to-Text and Text-to-Speech are both NLP(natural language processing) powered models which transform speech to text and vice versa, providing an increased scope of learning for the parties involved. For the past couple of years it's been observed that students have been moving abroad for quality education and better financial aid. Since there is an accent gap between students and tutors which reduces the understanding of students. Our work is done to solve the aforementioned problem. With its state-of-the-art STT(speech-to-text) and TTS(text-to-speech) softwares this work intends to ease the learning curve of the students. The key targets of this work are international students, individuals with disabilities. It can also be used to transcribe meetings for quick conversion of meeting discussion points into text. Companies can also use the model to get the data for the call recordings and further perform sentiment analysis and various such activities. This research aims to give a detailed walk through of the product as it stands, and provide details regarding all aspects of the product. This covers the various tech stacks used, the implementation of the said technologies, the reports shown to the different end users. This provides the workflow of the product.

Keywords : Artificial Intelligence, Deep Learning, Large Language Models, Automatic Speech Recognition, Transcription, Whisper AI, gTTs.