Training source
Hi! Would you be willing to share the training source code?
Thanks!
Hey! I’ve added some inline comments and notes to the Colab here:
Colab Link.
This notebook is based on and modified from the original work by
@Steveeeeeeen
finetune_phi4mm.ipynb.
Note:
It’s advisable to unfreeze the speech LoRA parameters, especially when working with new languages. See this discussion for more details:
Hugging Face Discussion.
Initially, I hesitated to unfreeze these parameters to avoid potential issues, but I’ve realized it’s necessary because the frozen encoder resists learning language-specific phonemes, even with a large enough speech dataset.
Additionally, we may need to save LoRAs separately to allow further fine-tuning of the model with other modalities.
I’m also actively exploring improvements for fine-tuning this model and will share updates as I progress.