hexgrad/Kokoro-82M · [STATUS] Jan 12 Forecast

Jan 12: My intent is to supersede v0.19 with a better Kokoro model that dominates in every respect. To do this, I plan to continue training the unreleased v0.23 checkpoint on a richer data mix.

If successful, you should expect the next-gen Kokoro model to ship with more voices and languages, also under an Apache 2.0 license, with a similar 82M parameter architecture.
If unsuccessful, it would most likely be because the model does not converge, i.e. loss does not go down. That could be because of data quality issues, architecture limitations, overfitting on old data, underfitting on new data, etc. Rollbacks and model collapse are not unheard of in ML, but fingers crossed it does not happen here—or if they do, that I can address such issues should they come up.

Behind the scenes, slabs of data have been (and still are) coming in thanks to the outstanding community response to #21 and I am incredibly grateful for it. Some of these slabs are languages new to the model, which is exciting. Note that #21 is first-come-first-serve, and at some point I will not be able to airdrop your data into a GPU in the middle of a training run.

Most of my focus is now on organizing these slabs such that they can be dispatched to GPUs later. Training has not started yet, since data is still flowing in and much processing work remains. In the meantime, I may not be able to get to some of your questions, but please understand that is not without reason.

That's it for now, thanks everyone!