MPWARE's picture
Update README.md
5c2eb83 verified
---
license: mit
language:
- en
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
---
BitsAndBytes 4 bits quantization from DeepSeek-R1-Distill-Qwen-7B commit 393119fcd6a873e5776c79b0db01c96911f5f0fc
Tested successfully with vLLM 0.7.2 with the following parameters:
```python
llm_model = LLM(
"MPWARE/DeepSeek-R1-Distill-Qwen-7B-BnB-4bits",
task="generate",
dtype=torch.bfloat16,
max_num_seqs=8192,
max_model_len=8192,
trust_remote_code=True,
quantization="bitsandbytes",
load_format="bitsandbytes",
enforce_eager=True, # Required for vLLM architecture V1
tensor_parallel_size=1,
gpu_memory_utilization=0.95,
seed=42
)
```