Update README.md
Browse files
README.md
CHANGED
@@ -92,7 +92,9 @@ print(response)
|
|
92 |
|
93 |
### 3.1 Arena-Hard-Auto-v0.1
|
94 |
|
95 |
-
All results below, except those for `Xwen-72B-Chat`, are sourced from [Arena-Hard-Auto](https://github.com/lmarena/arena-hard-auto) (accessed on February 1, 2025).
|
|
|
|
|
96 |
|
97 |
#### 3.1.1 No Style Control
|
98 |
|
@@ -100,9 +102,11 @@ All results below, except those for `Xwen-72B-Chat`, are sourced from [Arena-Har
|
|
100 |
|
101 |
| | Score | 95% CIs |
|
102 |
| --------------------------------- | ------------------------ | ----------- |
|
103 |
-
| **Xwen-72B-Chat** π | **86.1** (Top-1 among π) | (-1.5, 1.7) |
|
104 |
| Qwen2.5-72B-Instruct π | 78.0 | (-1.8, 1.8) |
|
105 |
| Athene-v2-Chat π | 85.0 | (-1.4, 1.7) |
|
|
|
|
|
106 |
| Llama-3.1-Nemotron-70B-Instruct π | 84.9 | (-1.7, 1.8) |
|
107 |
| Llama-3.1-405B-Instruct-FP8 π | 69.3 | (-2.4, 2.2) |
|
108 |
| Claude-3-5-Sonnet-20241022 π | 85.2 | (-1.4, 1.6) |
|
|
|
92 |
|
93 |
### 3.1 Arena-Hard-Auto-v0.1
|
94 |
|
95 |
+
All results below, except those for `Xwen-72B-Chat`, `DeepSeek-V3` and `DeepSeek-R1`, are sourced from [Arena-Hard-Auto](https://github.com/lmarena/arena-hard-auto) (accessed on February 1, 2025).
|
96 |
+
|
97 |
+
The results of `DeepSeek-V3` and `DeepSeek-R1` are borrowed from their officially reported results.
|
98 |
|
99 |
#### 3.1.1 No Style Control
|
100 |
|
|
|
102 |
|
103 |
| | Score | 95% CIs |
|
104 |
| --------------------------------- | ------------------------ | ----------- |
|
105 |
+
| **Xwen-72B-Chat** π | **86.1** (Top-1 among π below 100B) | (-1.5, 1.7) |
|
106 |
| Qwen2.5-72B-Instruct π | 78.0 | (-1.8, 1.8) |
|
107 |
| Athene-v2-Chat π | 85.0 | (-1.4, 1.7) |
|
108 |
+
| DeepSeek-V3 **(671B >> 72B)** π | 85.5 | N/A |
|
109 |
+
| DeepSeek-R1 **(671B >> 72B)** π | **92.3** (Top-1 among π) | N/A |
|
110 |
| Llama-3.1-Nemotron-70B-Instruct π | 84.9 | (-1.7, 1.8) |
|
111 |
| Llama-3.1-405B-Instruct-FP8 π | 69.3 | (-2.4, 2.2) |
|
112 |
| Claude-3-5-Sonnet-20241022 π | 85.2 | (-1.4, 1.6) |
|