xwm commited on
Commit
50b9d90
·
verified ·
1 Parent(s): daaec58

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -6
README.md CHANGED
@@ -1,9 +1,16 @@
1
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
2
- should probably proofread and complete it, then remove this comment. -->
3
-
4
- # alfworld_workflow_preference_dpo_1.0
5
-
6
- This model is a fine-tuned version of [LLaMA-Factory/saves/llama3.1-8b/alfworld_workflow](https://huggingface.co/LLaMA-Factory/saves/llama3.1-8b/alfworld_workflow) on the alfworld_workflow_preference_1 dataset.
 
 
 
 
 
 
 
7
  It achieves the following results on the evaluation set:
8
  - Loss: 0.8390
9
  - Rewards/chosen: -0.5836
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - nlp
5
+ - agent
6
+ language:
7
+ - en
8
+ pipeline_tag: text-generation
9
+ ---
10
+
11
+ # ALFWorld-MPO
12
+
13
+ This model is a fine-tuned version of Llama-3.1-8B-Instruct on the [alfworld-metaplan-preference-pairs](https://huggingface.co/datasets/xwm/Meta_Plan_Optimization/blob/main/alfworld_metaplan_preference_pairs.json) dataset.
14
  It achieves the following results on the evaluation set:
15
  - Loss: 0.8390
16
  - Rewards/chosen: -0.5836