Chenlu123 commited on
Commit
28aa28e
·
verified ·
1 Parent(s): 3bea5cc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -13,11 +13,11 @@ Moreover, we provide a [detailed recipe](https://github.com/RLHFlow/Online-DPO-R
13
 
14
  ## Model Releases
15
  - [PPO model] (https://huggingface.co/RLHFlow/Qwen2.5-7B-PPO-Zero)
16
- - [Iterative DPO] (https://huggingface.co/RLHFlow/Qwen2.5-7B-DPO-Zero)
 
17
  - [Iterative DPO with Negative Log-Likelihood (NLL)] (https://huggingface.co/RLHFlow/Qwen2.5-7B-DPO-NLL-Zero)
18
  - [Raft] (https://huggingface.co/RLHFlow/Qwen2.5-7B-RAFT-Zero)
19
 
20
-
21
  ## Dataset
22
 
23
 
 
13
 
14
  ## Model Releases
15
  - [PPO model] (https://huggingface.co/RLHFlow/Qwen2.5-7B-PPO-Zero)
16
+ - [Iterative DPO from SFT model] (https://huggingface.co/RLHFlow/Qwen2.5-7B-DPO)
17
+ - [Iterative DPO from base model] (https://huggingface.co/RLHFlow/Qwen2.5-7B-DPO-Zero)
18
  - [Iterative DPO with Negative Log-Likelihood (NLL)] (https://huggingface.co/RLHFlow/Qwen2.5-7B-DPO-NLL-Zero)
19
  - [Raft] (https://huggingface.co/RLHFlow/Qwen2.5-7B-RAFT-Zero)
20
 
 
21
  ## Dataset
22
 
23