import gradio as gr #Get models #ASR model for input speech speech2text = gr.Interface.load("huggingface/facebook/hubert-large-ls960-ft", inputs=gr.inputs.Audio(label="Upload Audio", type="filepath", source = "upload")) #translates English to Spanish text translator = gr.Interface.load("huggingface/Helsinki-NLP/opus-mt-en-es", outputs=gr.outputs.Textbox(label="English to Spanish Translated Text")) #TTS model for output speech text2speech = gr.Interface.load("huggingface/facebook/tts_transformer-es-css10", outputs=gr.outputs.Audio(label="English to Spanish Translated Audio"), allow_flagging="never") translate = gr.Series(speech2text, translator) #outputs Spanish text translation en2es = gr.Series(translate, text2speech) #outputs Spanish audio ui = gr.Parallel(translate, en2es) #allows transcription of Spanish audio #gradio interface ui.title = "English to Spanish Speech Translator" ui.description = """
The model used for the ASR part of this space is from hubert-large-ls960-ft which is pretrained and fine-tuned on 960 hours of Librispeech on 16kHz sampled speech audio. This model has a self-reported word error rate (WER) of 1.9 percent and ranks first in paperswithcode for ASR on Librispeech. More information can be found on its website at hubert-self and original model is under pytorch/fairseq.
The English to Spanish text translator pre-trained model is from Helsinki-NLP/opus-mt-en-es which is part of the The Tatoeba Translation Challenge (v2021-08-07) as seen from its github repo at Helsinki-NLP/Tatoeba-Challenge. This project aims to develop machine translation in real-world cases for many languages.
The TTS model used is from facebook/tts_transformer-es- css10. This model uses the Fairseq(-py) sequence modeling toolkit for speech synthesis, in this case, specifically TTS for Spanish. More information can be seen on their git at speech_synthesis.
""" ui.launch(inbrowser=True)