ysdede/Phi-4-mm-inst-asr-turkish

Hey! I’ve added some inline comments and notes to the Colab here:
Colab Link.
This notebook is based on and modified from the original work by @Steveeeeeeen finetune_phi4mm.ipynb.

Note:
It’s advisable to unfreeze the speech LoRA parameters, especially when working with new languages. See this discussion for more details:
Hugging Face Discussion.

Initially, I hesitated to unfreeze these parameters to avoid potential issues, but I’ve realized it’s necessary because the frozen encoder resists learning language-specific phonemes, even with a large enough speech dataset.

Additionally, we may need to save LoRAs separately to allow further fine-tuning of the model with other modalities.

I’m also actively exploring improvements for fine-tuning this model and will share updates as I progress.

ysdede
/

Phi-4-mm-inst-asr-turkish

Training source