Update README.md
Browse files
README.md
CHANGED
@@ -33,18 +33,18 @@ Recent studies show that DPO benefits from iterative training with online prefer
|
|
33 |
## Performance
|
34 |
Our 7B model achieves a **50.5%** length-controlled win rate against GPT-4 Preview on AlpacaEval 2.0.
|
35 |
<p align="center">
|
36 |
-
<img src="https://cdn-uploads.huggingface.co/production/uploads/639be86b59473c6ae02ef9c4/Tj_a1QntAxkhy2SXbOdmT.png" width="
|
37 |
</p>
|
38 |
Our model's LC win rate improves over iterations without significantly changing the response length, indicating better alignment with human values without length bias. The final trained model (iteration 3) achieves a 50.5% LC win rate, making it the first open-source model to surpass the baseline model GPT-4 Preview.
|
39 |
|
40 |
In addition to regular decoding, we also test beam search and best-of-n sampling on top of our trained model. Beam search over our trained model shows a 5% improvement over regular decoding, Best-of-n sampling with Starling-RM-34B achieves 61.6% LC Win rate and outperforms GPT-4 Omni.
|
41 |
<p align="center">
|
42 |
-
<img src="https://cdn-uploads.huggingface.co/production/uploads/639be86b59473c6ae02ef9c4/GGa28vaREaVq099MPdqcP.png" width="
|
43 |
</p>
|
44 |
|
45 |
We observe no significant degradation in traditional NLP tasks from the Huggingface Open LLM Leaderboard.
|
46 |
<p align="center">
|
47 |
-
<img src="https://cdn-uploads.huggingface.co/production/uploads/639be86b59473c6ae02ef9c4/8KEm_Ladg7Kqko8mC63SN.png" width="
|
48 |
</p>
|
49 |
|
50 |
|
|
|
33 |
## Performance
|
34 |
Our 7B model achieves a **50.5%** length-controlled win rate against GPT-4 Preview on AlpacaEval 2.0.
|
35 |
<p align="center">
|
36 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/639be86b59473c6ae02ef9c4/Tj_a1QntAxkhy2SXbOdmT.png" width="60%">
|
37 |
</p>
|
38 |
Our model's LC win rate improves over iterations without significantly changing the response length, indicating better alignment with human values without length bias. The final trained model (iteration 3) achieves a 50.5% LC win rate, making it the first open-source model to surpass the baseline model GPT-4 Preview.
|
39 |
|
40 |
In addition to regular decoding, we also test beam search and best-of-n sampling on top of our trained model. Beam search over our trained model shows a 5% improvement over regular decoding, Best-of-n sampling with Starling-RM-34B achieves 61.6% LC Win rate and outperforms GPT-4 Omni.
|
41 |
<p align="center">
|
42 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/639be86b59473c6ae02ef9c4/GGa28vaREaVq099MPdqcP.png" width="100%">
|
43 |
</p>
|
44 |
|
45 |
We observe no significant degradation in traditional NLP tasks from the Huggingface Open LLM Leaderboard.
|
46 |
<p align="center">
|
47 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/639be86b59473c6ae02ef9c4/8KEm_Ladg7Kqko8mC63SN.png" width="100%">
|
48 |
</p>
|
49 |
|
50 |
|