File size: 644 Bytes
994c9df
 
 
 
 
 
 
 
5c2eb83
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
---
license: mit
language:
- en
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
---
BitsAndBytes 4 bits quantization from DeepSeek-R1-Distill-Qwen-7B commit 393119fcd6a873e5776c79b0db01c96911f5f0fc

Tested successfully with vLLM 0.7.2 with the following parameters:

```python
llm_model = LLM(
	"MPWARE/DeepSeek-R1-Distill-Qwen-7B-BnB-4bits",
	task="generate",
	dtype=torch.bfloat16,
	max_num_seqs=8192,
	max_model_len=8192,
	trust_remote_code=True,
	quantization="bitsandbytes",
	load_format="bitsandbytes",
	enforce_eager=True, # Required for vLLM architecture V1
	tensor_parallel_size=1, 
	gpu_memory_utilization=0.95,  
	seed=42
)
```