Update README.md
Browse files
README.md
CHANGED
@@ -69,14 +69,19 @@ Phi-4-multimodal model is strong in multimodal tasks, especially in speech-to-te
|
|
69 |
|
70 |
## Evaluation
|
71 |
|
72 |
-
|
|
|
|
|
|
|
73 |
Script is retrieved from [here](https://gist.github.com/seastar105/d1d8983b27611370528e3b194dcc5577#file-evaluate-py).
|
74 |
|
|
|
75 |
|
76 |
| Model | zeroth-test | fleurs-ko2en | fleurs-ko2en-cot | fleurs-en2ko | fleurs-en2ko-cot |
|
77 |
|----------------------|-------------|--------------|------------------|--------------|------------------|
|
78 |
| original | 198.32 | 5.63 | 2.42 | 6.86 | 4.17 |
|
79 |
| finetune (this model)| 3.80 | 7.03 | 7.04 | 12.50 | 9.54 |
|
|
|
80 |
|
81 |
## References
|
82 |
|
|
|
69 |
|
70 |
## Evaluation
|
71 |
|
72 |
+
Evaluation was done on the following datasets:
|
73 |
+
- ASR (Automatic Speech Recognition): Evaluated with CER (Character Error Rate) on zeroth-test set (457 samples).
|
74 |
+
- AST (Automatic Speech Translation): Evaluated with BLEU score on fleurs ko <-> en speech translation result (270 samples).
|
75 |
+
|
76 |
Script is retrieved from [here](https://gist.github.com/seastar105/d1d8983b27611370528e3b194dcc5577#file-evaluate-py).
|
77 |
|
78 |
+
Compared to [this fine-tuned model](https://huggingface.co/seastar105/Phi-4-mm-inst-zeroth-kor), ASR is significantly improved with more high-quality voice data and my own voice. However, the quality of AST deteriorates for fleurs-ko2en-cot, so appropriate data should be inserted in between to improve catastrophic forgetting.
|
79 |
|
80 |
| Model | zeroth-test | fleurs-ko2en | fleurs-ko2en-cot | fleurs-en2ko | fleurs-en2ko-cot |
|
81 |
|----------------------|-------------|--------------|------------------|--------------|------------------|
|
82 |
| original | 198.32 | 5.63 | 2.42 | 6.86 | 4.17 |
|
83 |
| finetune (this model)| 3.80 | 7.03 | 7.04 | 12.50 | 9.54 |
|
84 |
+
| Phi-4-mm-inst-zeroth-kor | 7.02 | 7.07 | 9.19 | 13.08 | 9.35 |
|
85 |
|
86 |
## References
|
87 |
|