xwen-team
/

Xwen-7B-Chat

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

shenzhi-wang commited on 23 days ago

Commit

5e5b527

·

verified ·

1 Parent(s): fafa8b7

Update README.md

Files changed (1) hide show

README.md +22 -0

README.md CHANGED Viewed

@@ -114,6 +114,28 @@ All results below, except those for `Xwen-7B-Chat`, are sourced from [Arena-Hard
 | Starling-LM-7B-beta 🔑   | 26.1     | (-2.6, 2.0) |
 ## References

 | Starling-LM-7B-beta 🔑   | 26.1     | (-2.6, 2.0) |
+### 3.2 AlignBench-v1.1
+> [!IMPORTANT]
+> We replaced the original judge model, `GPT-4-0613`, in AlignBench with the more powerful model, `GPT-4o-0513`. To keep fairness, all the results below are generated by ``GPT-4o-0513``. As a result, the following results may differ from the AlignBench-v1.1 scores reported elsewhere.
+|                    | Score    |
+| ------------------ | -------- |
+| **Xwen-7B-Chat** 🔑 | **6.88** |
+| Qwen2.5-7B-Chat 🔑  | 6.56     |
+### 3.3 MT-Bench
+> [!IMPORTANT]
+> We replaced the original judge model, `GPT-4`, in MT-Bench with the more powerful model, `GPT-4o-0513`. To keep fairness, all the results below are generated by ``GPT-4o-0513``. As a result, the following results may differ from the MT-Bench scores reported elsewhere.
+|                    | Score    |
+| ------------------ | -------- |
+| **Xwen-7B-Chat** 🔑 | **7.98** |
+| Qwen2.5-7B-Chat 🔑  | 7.71     |
 ## References