s1-32B-GGUF / README.md
brittlewis12's picture
Update README.md
467f5b5 verified
metadata
base_model: simplescaling/s1-32B
pipeline_tag: text-generation
inference: true
language:
  - en
license: apache-2.0
model_creator: simplescaling
model_name: s1-32B
model_type: qwen2
datasets:
  - simplescaling/s1K
quantized_by: brittlewis12

s1 32B GGUF

Update as of 2025-02-11: See the revised s1.1 for an improved model, based on DeepSeek R1 reasoning traces rather than Gemini traces.


Original model: s1 32B

Model creator: simplescaling

s1 is a reasoning model finetuned from Qwen2.5-32B-Instruct on just 1,000 examples. It matches o1-preview & exhibits test-time scaling via budget forcing.

This repo contains GGUF format model files for simplescaling’s s1 32B, an open reproduction of OpenAI’s o1-preview on 1,000 reasoning traces, including model, source code, and data (see s1K).

Learn more on simplescaling’s s1 github repo & arxiv preprint.

What is GGUF?

GGUF is a file format for representing AI models. It is the third version of the format, introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Converted with llama.cpp build 4628 (revision cde3833), using autogguf-rs.

Prompt template: ChatML

<|im_start|>system
{{system_message}}<|im_end|>
<|im_start|>user
{{prompt}}<|im_end|>
<|im_start|>assistant

Download & run with cnvrs on iPhone, iPad, and Mac!

cnvrs.ai

cnvrs is the best app for private, local AI on your device:

  • create & save Characters with custom system prompts & temperature settings
  • download and experiment with any GGUF model you can find on HuggingFace!
    • or, use an API key with the chat completions-compatible model provider of your choice -- ChatGPT, Claude, Gemini, DeepSeek, & more!
  • make it your own with custom Theme colors
  • powered by Metal ⚡️ & Llama.cpp, with haptics during response streaming!
  • try it out yourself today, on Testflight!
  • follow cnvrs on twitter to stay up to date

Original Model Evaluation

Table 1: s1-32B is an open and sample-efficient reasoning model. We evaluate s1-32B, Qwen, and Gemini (some entries are unknown (N.A.), see §4). Other results are from the respective reports (Qwen et al., 2024; Team, 2024b; OpenAI, 2024; DeepSeek-AI et al., 2025; Labs, 2025; Team, 2025).

# ex. = number examples used for reasoning finetuning; BF = budget forcing.

via s1: Simple test-time scaling (4.1 Results)

Model # ex. AIME 2024 MATH 500 GPQA Diamond
API only
o1-preview N.A. 44.6 85.5 73.3
o1-mini N.A. 70.0 90.0 60.0
o1 N.A. 74.4 94.8 77.3
Gemini 2.0 Flash Think. N.A. 60.0 N.A. N.A.
Open Weights
Qwen2.5-32B-Instruct N.A. 26.7 84.0 49.0
QwQ-32B N.A. 50.0 90.6 65.2
r1 >>800K 79.8 97.3 71.5
r1-distill 800K 72.6 94.3 62.1
Open Weights and Open Data
Sky-T1 17K 43.3 82.4 56.8
Bespoke-32B 17K 63.3 93.0 58.1
s1 w/o BF 1K 50.0 92.6 56.6
s1-32B 1K 56.7 93.0 59.6