AdamLucek/Qwen2.5-3B-Instruct-GRPO-2K-GSM8K

This model is a GRPO fine-tuned version of unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit on a subset of 2,000 examples from openai/gsm8k using Unsloth.

Usage with vLLM & Unsloth

from unsloth import FastLanguageModel
from vllm import SamplingParams
import torch

# Load the Model & Tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "AdamLucek/Qwen2.5-3B-Instruct-GRPO-2K-GSM8K",
    max_seq_length = 2048,
    load_in_4bit = True,
    fast_inference = True,
    gpu_memory_utilization = 0.7,
)

# Prep the Message
PROMPT = "How many r's are in the word strawberry?"

SYSTEM_PROMPT = """
Respond in the following format:
<reasoning>
...
</reasoning>
<answer>
...
</answer>
"""

text = tokenizer.apply_chat_template([
    {"role" : "system", "content" : SYSTEM_PROMPT},
    {"role" : "user", "content" : PROMPT},
], tokenize = False, add_generation_prompt = True)

# Generate a response
sampling_params = SamplingParams(
    temperature = 0.8,
    top_p = 0.95,
    max_tokens = 1024,
)
output = model.fast_generate(
    text,
    sampling_params = sampling_params,
)[0].outputs[0].text