Deploying production ready Deepseek R1 on your AWS with vLLM

#32
by samagra14 - opened

Hi People

Within 48 hours of DeepSeek-R1’s release, developers are already running the 671B reasoning beast in production

We just dropped the ultimate guide to deploy it on serverless GPUs on your own cloud: https://tensorfuse.io/docs/guides/deepseek_r1

Hope this guide helps you all experimenting with agents that require reasoning.

Thanks for the guide. Seems like my weekend is sorted.

cheers! how much would the hourly costs be for that? let's assume the 671b version

Many thanks for sharing this.

How many tokens per second do you get with this estup?

@gregplatflow

for deepseek 70b hosted on 4XL40s using Tensorfuse, you can get upto a maximum of 453 tps with 64 concurrent users. On the lower side, for 4 concurrent requests, you can get ~55tps. For a single request, it's around 20tps

Checkout the table below

Screenshot 2025-01-29 at 1.18.40 AM.png

@agam30 This is helpful. Thanks for sharing this

Sign up or log in to comment