Qwen2.5-3B-Instruct-GRPO-basic-sampling_temp_05 / pytorch_model-00002-of-00002.bin

Commit History

Trained with Unsloth
1fc0051
verified

kenhktsui commited on