unsloth-QwQ-32B-gguf-japanese-imatrix

Qwen/QwQ-32B は人によって評価がかなり分かれています。これは、パラメータ―に敏感である事が影響しているようです。
このgguf版は量子化パラメータ―を改良し、日本語能力を向上させたggufの作成を目指したものです
詳細は検証中です

Qwen/QwQ-32B has received very mixed reviews from people. This is likely due to its sensitivity to parameters.
This gguf version aims to create a gguf with improved quantization parameters and improved Japanese language capabilities.
Details are under verification.

currnet sample parameters.

temperature = 0.6
top-k = 40 (20 to 40 suggested)
min-p = 0.00 (optional, but 0.01 works well, llama.cpp default is 0.1)
top-p = 0.95
repetition-penalty = 1.0
dry-multiplier 0.5
Chat template: <|im_start|>user\nCreate a Flappy Bird game in Python.<|im_end|>\n<|im_start|>assistant\n<think>\n

Reference information
Tutorial: How to Run QwQ-32B effectively

Downloads last month
150
GGUF
Model size
32.8B params
Architecture
qwen2

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.