xwm
/

SciWorld-MPO

Reinforcement Learning

text-generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

xwm commited on 6 days ago

Commit

cffcfa3

·

verified ·

1 Parent(s): 8fffc3c

Update README.md

Files changed (1) hide show

README.md +13 -3

README.md CHANGED Viewed

@@ -1,6 +1,16 @@
-# sciworld_workflow_compact_preference
-This model is a fine-tuned version of [LLaMA-Factory/saves/llama3.1-8b/sciworld_workflow_compact](https://huggingface.co/LLaMA-Factory/saves/llama3.1-8b/sciworld_workflow_compact) on the sciworld_workflow_compact_preference dataset.
 It achieves the following results on the evaluation set:
 - Loss: 1.5017
 - Rewards/chosen: -3.8774
@@ -52,4 +62,4 @@ The following hyperparameters were used during training:
 - Transformers 4.46.1
 - Pytorch 2.5.1+cu124
 - Datasets 3.1.0
-- Tokenizers 0.20.3

+---
+license: apache-2.0
+tags:
+- nlp
+- agent
+language:
+- en
+pipeline_tag: text-generation
+---
+# SciWorld-MPO
+This model is a fine-tuned version of Llama-3.1-8B-Instruct on the [sciworld-metaplan-preference-pairs](https://huggingface.co/datasets/xwm/Meta_Plan_Optimization/blob/main/sciworld_metaplan_preference_pairs.json) dataset.
 It achieves the following results on the evaluation set:
 - Loss: 1.5017
 - Rewards/chosen: -3.8774
 - Transformers 4.46.1
 - Pytorch 2.5.1+cu124
 - Datasets 3.1.0
+- Tokenizers 0.20.3