I would like to run desklib locally with a 8GB-memory GPU. Since FP16 is not supported, it is okay to use quantization like using ONNX Runtime? What about bitsandbytes?
· Sign up or log in to comment