Issue with --n-gpu-layers 5 Parameter: Model Only Running on CPU

#10
by vuk123 - opened

Hi, I’m facing an issue where the --n-gpu-layers 5 parameter doesn’t seem to work. Despite having 2x NVIDIA A6000 GPUs, the model runs entirely on the CPU, with no GPU utilization. Has anyone else encountered this, or is there a fix for it?

this is how i run model : llama-cli --model /home/user/mymodels/DeepSeek-V3-Q3_K_M/DeepSeek-V3-Q3_K_M-00001-of-00007.gguf --cache-type-k q5_0 --threads 16 --prompt '<|User|>What is 1+1?<|Assistant|>' --n-gpu-layers 5

it look like problem is i installed llama.cpp with brew so its not compiled by cuda...

i build it with cmake, now it works...

i build it with cmake, now it works...

glad you got it working!!

Sign up or log in to comment