deepseek-r1 / README.md

Update README.md

f19516c verified 11 days ago

5.01 kB

	---
	license: mit
	language:
	- en
	base_model:
	- deepseek-ai/DeepSeek-R1
	pipeline_tag: text-generation
	tags:
	- deepseek-r1
	- gguf-connector
	---

	# GGUF quantized version of deepseek-r1

	### review
	- no more error loading message: "unknown pre-tokenizer type: deepseek-r1-qwen"
	- works fine for llama architecture

	### run the model
	use any gguf connector to interact with gguf file(s), i.e., [connector](https://pypi.org/project/gguf-connector/)

	### reference
	- base model: deepseek-ai/[DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1)
	- tool used for quantization: [cutter](https://pypi.org/project/gguf-cutter)

	### citation
	[DeepSeek-R1](https://arxiv.org/pdf/2501.12948)

	### appendices: model evaluation (written by deekseek-ai)

	#### deepseek-r1-evaluation
	for all our (here refer to deekseek-ai) models, the maximum generation length is set to 32,768 tokens; for benchmarks requiring sampling, we use a temperature of $0.6$, a top-p value of $0.95$, and generate 64 responses per query to estimate pass@1.

	\| Category \| Benchmark (Metric) \| Claude-3.5-Sonnet-1022 \| GPT-4o 0513 \| DeepSeek V3 \| OpenAI o1-mini \| OpenAI o1-1217 \| DeepSeek R1 \|
	\|----------\|-------------------\|----------------------\|------------\|--------------\|----------------\|------------\|--------------\|
	\| \| Architecture \| - \| - \| MoE \| - \| - \| MoE \|
	\| \| # Activated Params \| - \| - \| 37B \| - \| - \| 37B \|
	\| \| # Total Params \| - \| - \| 671B \| - \| - \| 671B \|
	\| English \| MMLU (Pass@1) \| 88.3 \| 87.2 \| 88.5 \| 85.2 \| 91.8 \| 90.8 \|
	\| \| MMLU-Redux (EM) \| 88.9 \| 88.0 \| 89.1 \| 86.7 \| - \| 92.9 \|
	\| \| MMLU-Pro (EM) \| 78.0 \| 72.6 \| 75.9 \| 80.3 \| - \| 84.0 \|
	\| \| DROP (3-shot F1) \| 88.3 \| 83.7 \| 91.6 \| 83.9 \| 90.2 \| 92.2 \|
	\| \| IF-Eval (Prompt Strict) \| 86.5 \| 84.3 \| 86.1 \| 84.8 \| - \| 83.3 \|
	\| \| GPQA-Diamond (Pass@1) \| 65.0 \| 49.9 \| 59.1 \| 60.0 \| 75.7 \| 71.5 \|
	\| \| SimpleQA (Correct) \| 28.4 \| 38.2 \| 24.9 \| 7.0 \| 47.0 \| 30.1 \|
	\| \| FRAMES (Acc.) \| 72.5 \| 80.5 \| 73.3 \| 76.9 \| - \| 82.5 \|
	\| \| AlpacaEval2.0 (LC-winrate) \| 52.0 \| 51.1 \| 70.0 \| 57.8 \| - \| 87.6 \|
	\| \| ArenaHard (GPT-4-1106) \| 85.2 \| 80.4 \| 85.5 \| 92.0 \| - \| 92.3 \|
	\| Code \| LiveCodeBench (Pass@1-COT) \| 33.8 \| 34.2 \| - \| 53.8 \| 63.4 \| 65.9 \|
	\| \| Codeforces (Percentile) \| 20.3 \| 23.6 \| 58.7 \| 93.4 \| 96.6 \| 96.3 \|
	\| \| Codeforces (Rating) \| 717 \| 759 \| 1134 \| 1820 \| 2061 \| 2029 \|
	\| \| SWE Verified (Resolved) \| 50.8 \| 38.8 \| 42.0 \| 41.6 \| 48.9 \| 49.2 \|
	\| \| Aider-Polyglot (Acc.) \| 45.3 \| 16.0 \| 49.6 \| 32.9 \| 61.7 \| 53.3 \|
	\| Math \| AIME 2024 (Pass@1) \| 16.0 \| 9.3 \| 39.2 \| 63.6 \| 79.2 \| 79.8 \|
	\| \| MATH-500 (Pass@1) \| 78.3 \| 74.6 \| 90.2 \| 90.0 \| 96.4 \| 97.3 \|
	\| \| CNMO 2024 (Pass@1) \| 13.1 \| 10.8 \| 43.2 \| 67.6 \| - \| 78.8 \|
	\| Chinese \| CLUEWSC (EM) \| 85.4 \| 87.9 \| 90.9 \| 89.9 \| - \| 92.8 \|
	\| \| C-Eval (EM) \| 76.7 \| 76.0 \| 86.5 \| 68.9 \| - \| 91.8 \|
	\| \| C-SimpleQA (Correct) \| 55.4 \| 58.7 \| 68.0 \| 40.3 \| - \| 63.7 \|

	#### distilled model evaluation

	\| Model \| AIME 2024 pass@1 \| AIME 2024 cons@64 \| MATH-500 pass@1 \| GPQA Diamond pass@1 \| LiveCodeBench pass@1 \| CodeForces rating \|
	\|------------------------------------------\|------------------\|-------------------\|-----------------\|----------------------\|----------------------\|-------------------\|
	\| GPT-4o-0513 \| 9.3 \| 13.4 \| 74.6 \| 49.9 \| 32.9 \| 759 \|
	\| Claude-3.5-Sonnet-1022 \| 16.0 \| 26.7 \| 78.3 \| 65.0 \| 38.9 \| 717 \|
	\| o1-mini \| 63.6 \| 80.0 \| 90.0 \| 60.0 \| 53.8 \| 1820 \|
	\| QwQ-32B-Preview \| 44.0 \| 60.0 \| 90.6 \| 54.5 \| 41.9 \| 1316 \|
	\| DeepSeek-R1-Distill-Qwen-1.5B \| 28.9 \| 52.7 \| 83.9 \| 33.8 \| 16.9 \| 954 \|
	\| DeepSeek-R1-Distill-Qwen-7B \| 55.5 \| 83.3 \| 92.8 \| 49.1 \| 37.6 \| 1189 \|
	\| DeepSeek-R1-Distill-Qwen-14B \| 69.7 \| 80.0 \| 93.9 \| 59.1 \| 53.1 \| 1481 \|
	\| DeepSeek-R1-Distill-Qwen-32B \| 72.6 \| 83.3 \| 94.3 \| 62.1 \| 57.2 \| 1691 \|
	\| DeepSeek-R1-Distill-Llama-8B \| 50.4 \| 80.0 \| 89.1 \| 49.0 \| 39.6 \| 1205 \|
	\| DeepSeek-R1-Distill-Llama-70B \| 70.0 \| 86.7 \| 94.5 \| 65.2 \| 57.5 \| 1633 \|

	\* these two tables are directly quoted from deepseek-ai

	---
	license: mit
	language:
	- en
	base_model:
	- deepseek-ai/DeepSeek-R1
	pipeline_tag: text-generation
	tags:
	- deepseek-r1
	- gguf-connector
	---

	# GGUF quantized version of deepseek-r1

	### review
	- no more error loading message: "unknown pre-tokenizer type: deepseek-r1-qwen"
	- works fine for llama architecture

	### run the model
	use any gguf connector to interact with gguf file(s), i.e., [connector](https://pypi.org/project/gguf-connector/)

	### reference
	- base model: deepseek-ai/[DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1)
	- tool used for quantization: [cutter](https://pypi.org/project/gguf-cutter)

	### citation
	[DeepSeek-R1](https://arxiv.org/pdf/2501.12948)

	### appendices: model evaluation (written by deekseek-ai)

	#### deepseek-r1-evaluation
	for all our (here refer to deekseek-ai) models, the maximum generation length is set to 32,768 tokens; for benchmarks requiring sampling, we use a temperature of $0.6$, a top-p value of $0.95$, and generate 64 responses per query to estimate pass@1.

	\| Category \| Benchmark (Metric) \| Claude-3.5-Sonnet-1022 \| GPT-4o 0513 \| DeepSeek V3 \| OpenAI o1-mini \| OpenAI o1-1217 \| DeepSeek R1 \|
	\|----------\|-------------------\|----------------------\|------------\|--------------\|----------------\|------------\|--------------\|
	\| \| Architecture \| - \| - \| MoE \| - \| - \| MoE \|
	\| \| # Activated Params \| - \| - \| 37B \| - \| - \| 37B \|
	\| \| # Total Params \| - \| - \| 671B \| - \| - \| 671B \|
	\| English \| MMLU (Pass@1) \| 88.3 \| 87.2 \| 88.5 \| 85.2 \| 91.8 \| 90.8 \|
	\| \| MMLU-Redux (EM) \| 88.9 \| 88.0 \| 89.1 \| 86.7 \| - \| 92.9 \|
	\| \| MMLU-Pro (EM) \| 78.0 \| 72.6 \| 75.9 \| 80.3 \| - \| 84.0 \|
	\| \| DROP (3-shot F1) \| 88.3 \| 83.7 \| 91.6 \| 83.9 \| 90.2 \| 92.2 \|
	\| \| IF-Eval (Prompt Strict) \| 86.5 \| 84.3 \| 86.1 \| 84.8 \| - \| 83.3 \|
	\| \| GPQA-Diamond (Pass@1) \| 65.0 \| 49.9 \| 59.1 \| 60.0 \| 75.7 \| 71.5 \|
	\| \| SimpleQA (Correct) \| 28.4 \| 38.2 \| 24.9 \| 7.0 \| 47.0 \| 30.1 \|
	\| \| FRAMES (Acc.) \| 72.5 \| 80.5 \| 73.3 \| 76.9 \| - \| 82.5 \|
	\| \| AlpacaEval2.0 (LC-winrate) \| 52.0 \| 51.1 \| 70.0 \| 57.8 \| - \| 87.6 \|
	\| \| ArenaHard (GPT-4-1106) \| 85.2 \| 80.4 \| 85.5 \| 92.0 \| - \| 92.3 \|
	\| Code \| LiveCodeBench (Pass@1-COT) \| 33.8 \| 34.2 \| - \| 53.8 \| 63.4 \| 65.9 \|
	\| \| Codeforces (Percentile) \| 20.3 \| 23.6 \| 58.7 \| 93.4 \| 96.6 \| 96.3 \|
	\| \| Codeforces (Rating) \| 717 \| 759 \| 1134 \| 1820 \| 2061 \| 2029 \|
	\| \| SWE Verified (Resolved) \| 50.8 \| 38.8 \| 42.0 \| 41.6 \| 48.9 \| 49.2 \|
	\| \| Aider-Polyglot (Acc.) \| 45.3 \| 16.0 \| 49.6 \| 32.9 \| 61.7 \| 53.3 \|
	\| Math \| AIME 2024 (Pass@1) \| 16.0 \| 9.3 \| 39.2 \| 63.6 \| 79.2 \| 79.8 \|
	\| \| MATH-500 (Pass@1) \| 78.3 \| 74.6 \| 90.2 \| 90.0 \| 96.4 \| 97.3 \|
	\| \| CNMO 2024 (Pass@1) \| 13.1 \| 10.8 \| 43.2 \| 67.6 \| - \| 78.8 \|
	\| Chinese \| CLUEWSC (EM) \| 85.4 \| 87.9 \| 90.9 \| 89.9 \| - \| 92.8 \|
	\| \| C-Eval (EM) \| 76.7 \| 76.0 \| 86.5 \| 68.9 \| - \| 91.8 \|
	\| \| C-SimpleQA (Correct) \| 55.4 \| 58.7 \| 68.0 \| 40.3 \| - \| 63.7 \|

	#### distilled model evaluation

	\| Model \| AIME 2024 pass@1 \| AIME 2024 cons@64 \| MATH-500 pass@1 \| GPQA Diamond pass@1 \| LiveCodeBench pass@1 \| CodeForces rating \|
	\|------------------------------------------\|------------------\|-------------------\|-----------------\|----------------------\|----------------------\|-------------------\|
	\| GPT-4o-0513 \| 9.3 \| 13.4 \| 74.6 \| 49.9 \| 32.9 \| 759 \|
	\| Claude-3.5-Sonnet-1022 \| 16.0 \| 26.7 \| 78.3 \| 65.0 \| 38.9 \| 717 \|
	\| o1-mini \| 63.6 \| 80.0 \| 90.0 \| 60.0 \| 53.8 \| 1820 \|
	\| QwQ-32B-Preview \| 44.0 \| 60.0 \| 90.6 \| 54.5 \| 41.9 \| 1316 \|
	\| DeepSeek-R1-Distill-Qwen-1.5B \| 28.9 \| 52.7 \| 83.9 \| 33.8 \| 16.9 \| 954 \|
	\| DeepSeek-R1-Distill-Qwen-7B \| 55.5 \| 83.3 \| 92.8 \| 49.1 \| 37.6 \| 1189 \|
	\| DeepSeek-R1-Distill-Qwen-14B \| 69.7 \| 80.0 \| 93.9 \| 59.1 \| 53.1 \| 1481 \|
	\| DeepSeek-R1-Distill-Qwen-32B \| 72.6 \| 83.3 \| 94.3 \| 62.1 \| 57.2 \| 1691 \|
	\| DeepSeek-R1-Distill-Llama-8B \| 50.4 \| 80.0 \| 89.1 \| 49.0 \| 39.6 \| 1205 \|
	\| DeepSeek-R1-Distill-Llama-70B \| 70.0 \| 86.7 \| 94.5 \| 65.2 \| 57.5 \| 1633 \|

	\* these two tables are directly quoted from deepseek-ai