Support Vllm?

#30

by caoyizhen - opened 11 days ago

11 days ago

code from https://docs.vllm.ai/en/latest/features/quantization/gguf.html

from vllm import LLM, SamplingParams

# In this script, we demonstrate how to pass input to the chat method:
conversation = [
   {
      "role": "system",
      "content": "You are a helpful assistant"
   },
   {
      "role": "user",
      "content": "Hello"
   },
   {
      "role": "assistant",
      "content": "Hello! How can I assist you today?"
   },
   {
      "role": "user",
      "content": "Write an essay about the importance of higher education.",
   },
]

# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

# Create an LLM.
llm = LLM(model="unsloth/DeepSeek-R1-Distill-Qwen-32B-GGUF",
         tokenizer="Qwen/Qwen2.5-32B-Instruct")
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.chat(conversation, sampling_params)

# Print the outputs.
for output in outputs:
   prompt = output.prompt
   generated_text = output.outputs[0].text
   print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

modify model and tokenizer
raise a error:

OSERROR: It looks like the config file at 'xxxx/DeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf' is not a valid JSON file.

Question

this gguf not supported by vllm?

shimmyshimmer

Unsloth AI org 11 days ago

code from https://docs.vllm.ai/en/latest/features/quantization/gguf.html

from vllm import LLM, SamplingParams

# In this script, we demonstrate how to pass input to the chat method:
conversation = [
   {
      "role": "system",
      "content": "You are a helpful assistant"
   },
   {
      "role": "user",
      "content": "Hello"
   },
   {
      "role": "assistant",
      "content": "Hello! How can I assist you today?"
   },
   {
      "role": "user",
      "content": "Write an essay about the importance of higher education.",
   },
]

# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

# Create an LLM.
llm = LLM(model="unsloth/DeepSeek-R1-Distill-Qwen-32B-GGUF",
         tokenizer="Qwen/Qwen2.5-32B-Instruct")
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.chat(conversation, sampling_params)

# Print the outputs.
for output in outputs:
   prompt = output.prompt
   generated_text = output.outputs[0].text
   print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

modify model and tokenizer
raise a error:

OSERROR: It looks like the config file at 'xxxx/DeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf' is not a valid JSON file.

Question

this gguf not supported by vllm?

GGUF was just supportedfor DeepSeek. Follow along the issue here: https://github.com/vllm-project/vllm/issues/12573

caoyizhen

10 days ago

Thank you for reply. I get it.

caoyizhen changed discussion status to closed 10 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment