RLHFlow
/

Qwen2.5-7B-DPO-Zero

Model card Files Files and versions Community

Chenlu123 commited on 22 days ago

Commit

28aa28e

·

verified ·

1 Parent(s): 3bea5cc

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -13,11 +13,11 @@ Moreover, we provide a [detailed recipe](https://github.com/RLHFlow/Online-DPO-R
 ## Model Releases
 - [PPO model] (https://huggingface.co/RLHFlow/Qwen2.5-7B-PPO-Zero)
-- [Iterative DPO] (https://huggingface.co/RLHFlow/Qwen2.5-7B-DPO-Zero)
 - [Iterative DPO with Negative Log-Likelihood (NLL)] (https://huggingface.co/RLHFlow/Qwen2.5-7B-DPO-NLL-Zero)
 - [Raft] (https://huggingface.co/RLHFlow/Qwen2.5-7B-RAFT-Zero)
 ## Dataset

 ## Model Releases
 - [PPO model] (https://huggingface.co/RLHFlow/Qwen2.5-7B-PPO-Zero)
+- [Iterative DPO from SFT model] (https://huggingface.co/RLHFlow/Qwen2.5-7B-DPO)
+- [Iterative DPO from base model] (https://huggingface.co/RLHFlow/Qwen2.5-7B-DPO-Zero)
 - [Iterative DPO with Negative Log-Likelihood (NLL)] (https://huggingface.co/RLHFlow/Qwen2.5-7B-DPO-NLL-Zero)
 - [Raft] (https://huggingface.co/RLHFlow/Qwen2.5-7B-RAFT-Zero)
 ## Dataset