jieliu commited on
Commit
2a37ce4
·
verified ·
1 Parent(s): 2a2de07

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -33,18 +33,18 @@ Recent studies show that DPO benefits from iterative training with online prefer
33
  ## Performance
34
  Our 7B model achieves a **50.5%** length-controlled win rate against GPT-4 Preview on AlpacaEval 2.0.
35
  <p align="center">
36
- <img src="https://cdn-uploads.huggingface.co/production/uploads/639be86b59473c6ae02ef9c4/Tj_a1QntAxkhy2SXbOdmT.png" width="30%">
37
  </p>
38
  Our model's LC win rate improves over iterations without significantly changing the response length, indicating better alignment with human values without length bias. The final trained model (iteration 3) achieves a 50.5% LC win rate, making it the first open-source model to surpass the baseline model GPT-4 Preview.
39
 
40
  In addition to regular decoding, we also test beam search and best-of-n sampling on top of our trained model. Beam search over our trained model shows a 5% improvement over regular decoding, Best-of-n sampling with Starling-RM-34B achieves 61.6% LC Win rate and outperforms GPT-4 Omni.
41
  <p align="center">
42
- <img src="https://cdn-uploads.huggingface.co/production/uploads/639be86b59473c6ae02ef9c4/GGa28vaREaVq099MPdqcP.png" width="50%">
43
  </p>
44
 
45
  We observe no significant degradation in traditional NLP tasks from the Huggingface Open LLM Leaderboard.
46
  <p align="center">
47
- <img src="https://cdn-uploads.huggingface.co/production/uploads/639be86b59473c6ae02ef9c4/8KEm_Ladg7Kqko8mC63SN.png" width="50%">
48
  </p>
49
 
50
 
 
33
  ## Performance
34
  Our 7B model achieves a **50.5%** length-controlled win rate against GPT-4 Preview on AlpacaEval 2.0.
35
  <p align="center">
36
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/639be86b59473c6ae02ef9c4/Tj_a1QntAxkhy2SXbOdmT.png" width="60%">
37
  </p>
38
  Our model's LC win rate improves over iterations without significantly changing the response length, indicating better alignment with human values without length bias. The final trained model (iteration 3) achieves a 50.5% LC win rate, making it the first open-source model to surpass the baseline model GPT-4 Preview.
39
 
40
  In addition to regular decoding, we also test beam search and best-of-n sampling on top of our trained model. Beam search over our trained model shows a 5% improvement over regular decoding, Best-of-n sampling with Starling-RM-34B achieves 61.6% LC Win rate and outperforms GPT-4 Omni.
41
  <p align="center">
42
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/639be86b59473c6ae02ef9c4/GGa28vaREaVq099MPdqcP.png" width="100%">
43
  </p>
44
 
45
  We observe no significant degradation in traditional NLP tasks from the Huggingface Open LLM Leaderboard.
46
  <p align="center">
47
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/639be86b59473c6ae02ef9c4/8KEm_Ladg7Kqko8mC63SN.png" width="100%">
48
  </p>
49
 
50