I’m new to GGUF quants

#9
by fsaudm - opened

Can any of these quantizations run on GPUs? Specifically, A100s, I have access to 18 in 9 nodes of 2 each, and can create a ray cluster.

Can any of these quantizations run on GPUs? Specifically, A100s, I have access to 18 in 9 nodes of 2 each, and can create a ray cluster.

Yes of course you can in fact, it will run even better/faster on a GPU because technically you only need a CPU. Use llama.cpp

Sign up or log in to comment