README.md · MPWARE/DeepSeek-R1-Distill-Qwen-7B-BnB-4bits at main

metadata

license: mit
language:
  - en
base_model:
  - deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

BitsAndBytes 4 bits quantization from DeepSeek-R1-Distill-Qwen-7B commit 393119fcd6a873e5776c79b0db01c96911f5f0fc

Tested successfully with vLLM 0.7.2 with the following parameters:

llm_model = LLM(
    "MPWARE/DeepSeek-R1-Distill-Qwen-7B-BnB-4bits",
    task="generate",
    dtype=torch.bfloat16,
    max_num_seqs=8192,
    max_model_len=8192,
    trust_remote_code=True,
    quantization="bitsandbytes",
    load_format="bitsandbytes",
    enforce_eager=True, # Required for vLLM architecture V1
    tensor_parallel_size=1, 
    gpu_memory_utilization=0.95,  
    seed=42
)