Model Card for Qwen2.5-1.5B-Thinking

Improved Model at Qwen2.5-1.5B-Thinking-v1.1. It has been trained using TRL.

Evals

Model GSM8k 0-Shot GSM8k Few-Shot
Mistral-7B-v0.1 10 41
Qwen2.5-1.5B-Thinking 14.4 63.31

Training procedure

Weights & Biases Logged

Trained on 1xH100 96GB via Azure Cloud (East US2).

This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models.

Usage Recommendations

Recommend adhering to the following configurations when utilizing the models, including benchmarking, to achieve the expected performance:

  1. Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs.
  2. For mathematical problems, it is advisable to include a directive in your prompt such as: "Please reason step by step, and put your final answer within \boxed{}."
  3. When evaluating model performance, it is recommended to conduct multiple tests and average the results.
  4. This model is not enhanced for other domains apart from Maths.

Framework versions

  • TRL: 0.15.0.dev0
  • Transformers: 4.49.0.dev0
  • Pytorch: 2.5.1
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citations

Cite GRPO as:

@article{zhihong2024deepseekmath,
    title        = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
    author       = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
    year         = 2024,
    eprint       = {arXiv:2402.03300},
}

Cite TRL as:

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouรฉdec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}
Downloads last month
127
Safetensors
Model size
1.54B params
Tensor type
BF16
ยท
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for justinj92/Qwen2.5-1.5B-Thinking

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(221)
this model
Merges
1 model
Quantizations
3 models

Dataset used to train justinj92/Qwen2.5-1.5B-Thinking

Spaces using justinj92/Qwen2.5-1.5B-Thinking 3

Evaluation results