xwm
/

ALFWorld-MPO

@@ -1,9 +1,16 @@
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# alfworld_workflow_preference_dpo_1.0
-This model is a fine-tuned version of [LLaMA-Factory/saves/llama3.1-8b/alfworld_workflow](https://huggingface.co/LLaMA-Factory/saves/llama3.1-8b/alfworld_workflow) on the alfworld_workflow_preference_1 dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.8390
 - Rewards/chosen: -0.5836

+---
+license: apache-2.0
+tags:
+- nlp
+- agent
+language:
+- en
+pipeline_tag: text-generation
+---
+# ALFWorld-MPO
+This model is a fine-tuned version of Llama-3.1-8B-Instruct on the [alfworld-metaplan-preference-pairs](https://huggingface.co/datasets/xwm/Meta_Plan_Optimization/blob/main/alfworld_metaplan_preference_pairs.json) dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.8390
 - Rewards/chosen: -0.5836