Update README.md
Browse files
README.md
CHANGED
@@ -13,11 +13,11 @@ Moreover, we provide a [detailed recipe](https://github.com/RLHFlow/Online-DPO-R
|
|
13 |
|
14 |
## Model Releases
|
15 |
- [PPO model] (https://huggingface.co/RLHFlow/Qwen2.5-7B-PPO-Zero)
|
16 |
-
- [Iterative DPO] (https://huggingface.co/RLHFlow/Qwen2.5-7B-DPO
|
|
|
17 |
- [Iterative DPO with Negative Log-Likelihood (NLL)] (https://huggingface.co/RLHFlow/Qwen2.5-7B-DPO-NLL-Zero)
|
18 |
- [Raft] (https://huggingface.co/RLHFlow/Qwen2.5-7B-RAFT-Zero)
|
19 |
|
20 |
-
|
21 |
## Dataset
|
22 |
|
23 |
|
|
|
13 |
|
14 |
## Model Releases
|
15 |
- [PPO model] (https://huggingface.co/RLHFlow/Qwen2.5-7B-PPO-Zero)
|
16 |
+
- [Iterative DPO from SFT model] (https://huggingface.co/RLHFlow/Qwen2.5-7B-DPO)
|
17 |
+
- [Iterative DPO from base model] (https://huggingface.co/RLHFlow/Qwen2.5-7B-DPO-Zero)
|
18 |
- [Iterative DPO with Negative Log-Likelihood (NLL)] (https://huggingface.co/RLHFlow/Qwen2.5-7B-DPO-NLL-Zero)
|
19 |
- [Raft] (https://huggingface.co/RLHFlow/Qwen2.5-7B-RAFT-Zero)
|
20 |
|
|
|
21 |
## Dataset
|
22 |
|
23 |
|