gbueno86 commited on
Commit
85ed77b
·
verified ·
1 Parent(s): b87d5b8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -3
README.md CHANGED
@@ -1,3 +1,48 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
4
+ - Qwen/QwQ-32B
5
+ library_name: transformers
6
+ tags:
7
+ - mergekit
8
+ - merge
9
+ ---
10
+ # QwQ-R1-Distill-Merge-32B
11
+
12
+ Testing locally it behaved very well for math problems. It usually starts a problem without the <think> tag, but ends by closing it when using chatml template.
13
+
14
+ This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
15
+
16
+ ## Merge Details
17
+ ### Merge Method
18
+
19
+ This model was merged using the SLERP merge method.
20
+
21
+ ### Models Merged
22
+
23
+ The following models were included in the merge:
24
+ * /models/Qwen/QwQ-32B
25
+ * /models/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
26
+
27
+ ### Configuration
28
+
29
+ The following YAML configuration was used to produce this model:
30
+
31
+ ```yaml
32
+ base_model: /models/Qwen/QwQ-32B
33
+ dtype: bfloat16
34
+ merge_method: slerp
35
+ parameters:
36
+ t:
37
+ - filter: self_attn
38
+ value: [0.0, 0.5, 0.3, 0.7, 1.0]
39
+ - filter: mlp
40
+ value: [1.0, 0.5, 0.7, 0.3, 0.0]
41
+ - value: 0.5
42
+ slices:
43
+ - sources:
44
+ - layer_range: [0, 64]
45
+ model: /models/Qwen/QwQ-32B
46
+ - layer_range: [0, 64]
47
+ model: /models/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
48
+ ```