metadata

base_model: simplescaling/s1-32B
pipeline_tag: text-generation
inference: true
language:
  - en
license: apache-2.0
model_creator: simplescaling
model_name: s1-32B
model_type: qwen2
datasets:
  - simplescaling/s1K
quantized_by: brittlewis12

s1 32B GGUF

Update as of 2025-02-11: See the revised s1.1 for an improved model, based on DeepSeek R1 reasoning traces rather than Gemini traces.

Original model: s1 32B

Model creator: simplescaling

s1 is a reasoning model finetuned from Qwen2.5-32B-Instruct on just 1,000 examples. It matches o1-preview & exhibits test-time scaling via budget forcing.

This repo contains GGUF format model files for simplescaling’s s1 32B, an open reproduction of OpenAI’s o1-preview on 1,000 reasoning traces, including model, source code, and data (see s1K).

Learn more on simplescaling’s s1 github repo & arxiv preprint.

What is GGUF?

GGUF is a file format for representing AI models. It is the third version of the format, introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Converted with llama.cpp build 4628 (revision cde3833), using autogguf-rs.

Prompt template: ChatML

<|im_start|>system
{{system_message}}<|im_end|>
<|im_start|>user
{{prompt}}<|im_end|>
<|im_start|>assistant

Download & run with cnvrs on iPhone, iPad, and Mac!

cnvrs is the best app for private, local AI on your device:

create & save Characters with custom system prompts & temperature settings
download and experiment with any GGUF model you can find on HuggingFace!
- or, use an API key with the chat completions-compatible model provider of your choice -- ChatGPT, Claude, Gemini, DeepSeek, & more!
make it your own with custom Theme colors
powered by Metal ⚡️ & Llama.cpp, with haptics during response streaming!
try it out yourself today, on Testflight!
follow cnvrs on twitter to stay up to date

Original Model Evaluation

Table 1: s1-32B is an open and sample-efficient reasoning model. We evaluate s1-32B, Qwen, and Gemini (some entries are unknown (N.A.), see §4). Other results are from the respective reports (Qwen et al., 2024; Team, 2024b; OpenAI, 2024; DeepSeek-AI et al., 2025; Labs, 2025; Team, 2025).

# ex. = number examples used for reasoning finetuning; BF = budget forcing.

via s1: Simple test-time scaling (4.1 Results)

Model	# ex.	AIME 2024	MATH 500	GPQA Diamond
API only
o1-preview	N.A.	44.6	85.5	73.3
o1-mini	N.A.	70.0	90.0	60.0
o1	N.A.	74.4	94.8	77.3
Gemini 2.0 Flash Think.	N.A.	60.0	N.A.	N.A.
Open Weights
Qwen2.5-32B-Instruct	N.A.	26.7	84.0	49.0
QwQ-32B	N.A.	50.0	90.6	65.2
r1	>>800K	79.8	97.3	71.5
r1-distill	800K	72.6	94.3	62.1
Open Weights and Open Data
Sky-T1	17K	43.3	82.4	56.8
Bespoke-32B	17K	63.3	93.0	58.1
s1 w/o BF	1K	50.0	92.6	56.6
s1-32B	1K	56.7	93.0	59.6