AdamLucek/Qwen2.5-3B-Instruct-GRPO-2K-GSM8K
This model is a GRPO fine-tuned version of unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit on a subset of 2,000 examples from openai/gsm8k using Unsloth.
Usage with vLLM & Unsloth
from unsloth import FastLanguageModel
from vllm import SamplingParams
import torch
# Load the Model & Tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "AdamLucek/Qwen2.5-3B-Instruct-GRPO-2K-GSM8K",
max_seq_length = 2048,
load_in_4bit = True,
fast_inference = True,
gpu_memory_utilization = 0.7,
)
# Prep the Message
PROMPT = "How many r's are in the word strawberry?"
SYSTEM_PROMPT = """
Respond in the following format:
<reasoning>
...
</reasoning>
<answer>
...
</answer>
"""
text = tokenizer.apply_chat_template([
{"role" : "system", "content" : SYSTEM_PROMPT},
{"role" : "user", "content" : PROMPT},
], tokenize = False, add_generation_prompt = True)
# Generate a response
sampling_params = SamplingParams(
temperature = 0.8,
top_p = 0.95,
max_tokens = 1024,
)
output = model.fast_generate(
text,
sampling_params = sampling_params,
)[0].outputs[0].text
- Downloads last month
- 0
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.