xwm nielsr HF staff commited on
Commit
09cb62f
·
verified ·
1 Parent(s): cffcfa3

Fix pipeline tag, add links, improve model card (#1)

Browse files

- Fix pipeline tag, add links, improve model card (10d841b55778ef2b7eecc9016fba61f2d0977508)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +16 -5
README.md CHANGED
@@ -1,11 +1,18 @@
1
  ---
 
 
2
  license: apache-2.0
 
 
 
 
 
 
 
 
3
  tags:
4
  - nlp
5
  - agent
6
- language:
7
- - en
8
- pipeline_tag: text-generation
9
  ---
10
 
11
  # SciWorld-MPO
@@ -22,9 +29,13 @@ It achieves the following results on the evaluation set:
22
  - Logits/chosen: 0.5212
23
  - Logits/rejected: 0.5151
24
 
 
 
 
 
25
  ## Model description
26
 
27
- More information needed
28
 
29
  ## Intended uses & limitations
30
 
@@ -32,7 +43,7 @@ More information needed
32
 
33
  ## Training and evaluation data
34
 
35
- More information needed
36
 
37
  ## Training procedure
38
 
 
1
  ---
2
+ language:
3
+ - en
4
  license: apache-2.0
5
+ library_name: transformers
6
+ pipeline_tag: reinforcement-learning
7
+ datasets:
8
+ - xwm/Meta_Plan_Optimization
9
+ base_model:
10
+ - meta-llama/Llama-3.1-8B-Instruct
11
+ metrics:
12
+ - accuracy
13
  tags:
14
  - nlp
15
  - agent
 
 
 
16
  ---
17
 
18
  # SciWorld-MPO
 
29
  - Logits/chosen: 0.5212
30
  - Logits/rejected: 0.5151
31
 
32
+ See the original paper for more details: [MPO: Boosting LLM Agents with Meta Plan Optimization](https://hf.co/papers/2503.02682).
33
+
34
+ Code: https://github.com/WeiminXiong/MPO
35
+
36
  ## Model description
37
 
38
+ This model uses Meta Plan Optimization (MPO) to improve the planning capabilities of LLM agents. It leverages high-level general guidance through meta plans and enables continuous optimization based on feedback from the agent's task execution. It achieves state-of-the-art performance on ALFWorld and SciWorld, with an average accuracy of 83.1.
39
 
40
  ## Intended uses & limitations
41
 
 
43
 
44
  ## Training and evaluation data
45
 
46
+ The model was trained on the `sciworld-metaplan-preference-pairs` dataset, part of the [Meta_Plan_Optimization](https://huggingface.co/datasets/xwm/Meta_Plan_Optimization) dataset.
47
 
48
  ## Training procedure
49