Update README.md
Browse files
README.md
CHANGED
@@ -6,3 +6,22 @@ base_model:
|
|
6 |
- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
|
7 |
---
|
8 |
BitsAndBytes 4 bits quantization from DeepSeek-R1-Distill-Qwen-7B commit 393119fcd6a873e5776c79b0db01c96911f5f0fc
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
|
7 |
---
|
8 |
BitsAndBytes 4 bits quantization from DeepSeek-R1-Distill-Qwen-7B commit 393119fcd6a873e5776c79b0db01c96911f5f0fc
|
9 |
+
|
10 |
+
Tested successfully with vLLM 0.7.2 with the following parameters:
|
11 |
+
|
12 |
+
```python
|
13 |
+
llm_model = LLM(
|
14 |
+
"MPWARE/DeepSeek-R1-Distill-Qwen-7B-BnB-4bits",
|
15 |
+
task="generate",
|
16 |
+
dtype=torch.bfloat16,
|
17 |
+
max_num_seqs=8192,
|
18 |
+
max_model_len=8192,
|
19 |
+
trust_remote_code=True,
|
20 |
+
quantization="bitsandbytes",
|
21 |
+
load_format="bitsandbytes",
|
22 |
+
enforce_eager=True, # Required for vLLM architecture V1
|
23 |
+
tensor_parallel_size=1,
|
24 |
+
gpu_memory_utilization=0.95,
|
25 |
+
seed=42
|
26 |
+
)
|
27 |
+
```
|