How much vram do you need?

#12

by hyun10 - opened 9 days ago

hyun10

9 days ago

I have RTX3090 * 4, a total of 96gb vram but when I try to run this with vllm I get a cuda out of memory.

Should I use a quantization model?

zezen

9 days ago

BF16 memory：70B × 2B = 140 GB
Q8 memory：70B × 1B =70 GB
you should use Q8 model

7 days ago

The official version is F8 instead of Q8.

6 days ago

Is there any AWQ Q8 version of this model?

zezen

6 days ago

Is there any AWQ Q8 version of this model?

6 days ago

•

zezen

6 days ago

the AWQ of this Q8 is this? https://huggingface.co/Valdemardi/DeepSeek-R1-Distill-Llama-70B-AWQ/

This model is Q4. Q8 model file about 70Gb+

6 days ago

I am looking for Q8 model quantized using AWQ, do you know any?

zezen

6 days ago

I am looking for Q8 model quantized using AWQ, do you know any?

You can perform AWQ yourself.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment