shenzhi-wang commited on
Commit
5e5b527
Β·
verified Β·
1 Parent(s): fafa8b7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -0
README.md CHANGED
@@ -114,6 +114,28 @@ All results below, except those for `Xwen-7B-Chat`, are sourced from [Arena-Hard
114
  | Starling-LM-7B-beta πŸ”‘ | 26.1 | (-2.6, 2.0) |
115
 
116
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
117
 
118
  ## References
119
 
 
114
  | Starling-LM-7B-beta πŸ”‘ | 26.1 | (-2.6, 2.0) |
115
 
116
 
117
+ ### 3.2 AlignBench-v1.1
118
+
119
+ > [!IMPORTANT]
120
+ > We replaced the original judge model, `GPT-4-0613`, in AlignBench with the more powerful model, `GPT-4o-0513`. To keep fairness, all the results below are generated by ``GPT-4o-0513``. As a result, the following results may differ from the AlignBench-v1.1 scores reported elsewhere.
121
+
122
+ | | Score |
123
+ | ------------------ | -------- |
124
+ | **Xwen-7B-Chat** πŸ”‘ | **6.88** |
125
+ | Qwen2.5-7B-Chat πŸ”‘ | 6.56 |
126
+
127
+ ### 3.3 MT-Bench
128
+
129
+ > [!IMPORTANT]
130
+ > We replaced the original judge model, `GPT-4`, in MT-Bench with the more powerful model, `GPT-4o-0513`. To keep fairness, all the results below are generated by ``GPT-4o-0513``. As a result, the following results may differ from the MT-Bench scores reported elsewhere.
131
+
132
+ | | Score |
133
+ | ------------------ | -------- |
134
+ | **Xwen-7B-Chat** πŸ”‘ | **7.98** |
135
+ | Qwen2.5-7B-Chat πŸ”‘ | 7.71 |
136
+
137
+
138
+
139
 
140
  ## References
141