Support Vllm?
#30
by
caoyizhen
- opened
code from https://docs.vllm.ai/en/latest/features/quantization/gguf.html
from vllm import LLM, SamplingParams
# In this script, we demonstrate how to pass input to the chat method:
conversation = [
{
"role": "system",
"content": "You are a helpful assistant"
},
{
"role": "user",
"content": "Hello"
},
{
"role": "assistant",
"content": "Hello! How can I assist you today?"
},
{
"role": "user",
"content": "Write an essay about the importance of higher education.",
},
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
# Create an LLM.
llm = LLM(model="unsloth/DeepSeek-R1-Distill-Qwen-32B-GGUF",
tokenizer="Qwen/Qwen2.5-32B-Instruct")
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.chat(conversation, sampling_params)
# Print the outputs.
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
modify model and tokenizer
raise a error:
OSERROR: It looks like the config file at 'xxxx/DeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf' is not a valid JSON file.
Question
this gguf not supported by vllm?
code from https://docs.vllm.ai/en/latest/features/quantization/gguf.html
from vllm import LLM, SamplingParams # In this script, we demonstrate how to pass input to the chat method: conversation = [ { "role": "system", "content": "You are a helpful assistant" }, { "role": "user", "content": "Hello" }, { "role": "assistant", "content": "Hello! How can I assist you today?" }, { "role": "user", "content": "Write an essay about the importance of higher education.", }, ] # Create a sampling params object. sampling_params = SamplingParams(temperature=0.8, top_p=0.95) # Create an LLM. llm = LLM(model="unsloth/DeepSeek-R1-Distill-Qwen-32B-GGUF", tokenizer="Qwen/Qwen2.5-32B-Instruct") # Generate texts from the prompts. The output is a list of RequestOutput objects # that contain the prompt, generated text, and other information. outputs = llm.chat(conversation, sampling_params) # Print the outputs. for output in outputs: prompt = output.prompt generated_text = output.outputs[0].text print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
modify model and tokenizer
raise a error:OSERROR: It looks like the config file at 'xxxx/DeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf' is not a valid JSON file.
Question
this gguf not supported by vllm?
GGUF was just supportedfor DeepSeek. Follow along the issue here: https://github.com/vllm-project/vllm/issues/12573
Thank you for reply. I get it.
caoyizhen
changed discussion status to
closed