Deploying production ready service with Unsloth GGUF quants on your AWS account. (4 x L40S)

by samagra-tensorfuse - opened about 14 hours ago

about 14 hours ago

Hi People

In the past few weeks we have been doing tons of PoCs with enterprises trying to deploy DeepSeek R1. The most popular combination was the Unsloth GGUF
quants on 4xL40S.

We just dropped the guide to deploy it on serverless GPUs on your own cloud: https://tensorfuse.io/docs/guides/integrations/llama_cpp

Single request tok/sec - 24 tok/sec

Context size - 5k

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment