Text-to-Speech
ONNX
English

success with onnxruntime-web

#25
by shub1 - opened

firstly, thank you for creating / fine tuning this model it's seemingly fighting against the scaling meta...
I have written some javascript to get the model to output audio using onnxruntime for the web. I have yet to fully work out the phonemize functionality but the espeak-ng npm package seems to do the intial parts of it. but just feeding in token array and tensors will output the correct audio (so thats awesome!)

Screenshot 2025-01-08 at 6.11.42 PM.png

originally I had intended to port to onnx model to tensorflow .pb file(s) then convert to tf.js format but that didnt go so well with the sequenceEmpty and something to do with looping opset in onnx2tf (tried onnx-tf aswell). I later discovered your other model card styletts and its associated website. Hence, I was able to make progress using onnx model.

I did not know you had already created a website that works essentially in the same way (but for a different model: styletts?). is there plans to intergrate this model into your site. in which case I guess what I'm doing is litterally not worth any time. I dont want to infringe upon anyones rights especially when this tech is honestly all very cool.

Could you share your javascript code or create an npm package to share it?

https://github.com/Shubin123/kokorojs. you will need to move the onnx model into the project root. currently, only the full 32bit precision model onnx file works. works as a basic tts right now but I have not figured out how to save it as a playable wav file (right now it just plays in the browser so make sure volume is up and you press the button so that it can start/resume audioContext).

@shub1

I did not know you had already created a website that works essentially in the same way (but for a different model: styletts?).

As of this post, hexgrad.com currently runs an outdated web demo that lazy loads ONNX model parts from https://hf.co/hexgrad/styletts2 which is the base pretrained LibriTTS model supplied by the paper author at https://hf.co/yl4579/StyleTTS2-LibriTTS

is there plans to intergrate this model into your site. in which case I guess what I'm doing is litterally not worth any time.

I could and I might—eventually. Or I might sunset the JS web demo and just embed the Python HF/Gradio demo instead. Or I might have no demo on that site, and just externally link to the model repo and Spaces demo here on HF. I haven't yet decided, but all of those options definitely take a backseat to next-gen Kokoro model development.

For example, here is a preview of the next-gen English tokenizer: https://hf.co/spaces/hexgrad/Misaki-G2P

Porting that over to the JS web demo would be premature right now, because the next model with that tokenizer still needs to be trained and released. It's unclear when that will be, but if all goes well it should supersede v0.19, so I don't really want to spend much time upgrading the JS web demo to v0.19 if that will be superseded anyway.

I dont want to infringe upon anyones rights especially when this tech is honestly all very cool.

Remember, Kokoro is licensed under Apache 2.0, which is permissive. See https://en.wikipedia.org/wiki/Apache_License or ask an LLM to explain to you what you can do under Apache.

Sign up or log in to comment