I’m new to GGUF quants
#9
by
fsaudm
- opened
Can any of these quantizations run on GPUs? Specifically, A100s, I have access to 18 in 9 nodes of 2 each, and can create a ray cluster.
Can any of these quantizations run on GPUs? Specifically, A100s, I have access to 18 in 9 nodes of 2 each, and can create a ray cluster.
Yes of course you can in fact, it will run even better/faster on a GPU because technically you only need a CPU. Use llama.cpp