daekeun-ml
/

Phi-4-multimodal-finetune-ko-speech

phi-4-multimodal

Model card Files Files and versions Community

daekeun-ml commited on 4 days ago

Commit

a780aa8

·

verified ·

1 Parent(s): 0602162

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -57,7 +57,7 @@ This is a fine-tuned model for Korean speech-to-text translation, from [microsof
 - kresnik/zeroth_korean
 - mozilla-foundation/common_voice_17_0 (Used Korean speech only)
 - PolyAI/minds14 (Used Korean speech only)
-- Custom dataset on my own. The speech was a mix of fast and slow speech (Technical blog contents and presentations I have posted), with some modulation using [audiomentations](https://github.com/iver56/audiomentations).
 Total 35K samples. Each sample is a pair of Korean speech and its transcription. Dataset was sampled 16kHz.

 - kresnik/zeroth_korean
 - mozilla-foundation/common_voice_17_0 (Used Korean speech only)
 - PolyAI/minds14 (Used Korean speech only)
+- Custom dataset on my own. The speech was a mix of fast and slow speech (Technical blog contents and presentations I have posted), with some modulation using [audiomentations](https://github.com/iver56/audiomentations) and [this script](https://github.com/daekeun-ml/azure-genai-utils/blob/main/azure_genai_utils/stt/augment.py)
 Total 35K samples. Each sample is a pair of Korean speech and its transcription. Dataset was sampled 16kHz.