Deploying production ready Deepseek R1 on your AWS with vLLM
Hi People
Within 48 hours of DeepSeek-R1’s release, developers are already running the 671B reasoning beast in production
We just dropped the ultimate guide to deploy it on serverless GPUs on your own cloud: https://tensorfuse.io/docs/guides/deepseek_r1
Hope this guide helps you all experimenting with agents that require reasoning.
Thanks for the guide. Seems like my weekend is sorted.
cheers! how much would the hourly costs be for that? let's assume the 671b version
Many thanks for sharing this.
How many tokens per second do you get with this estup?
for deepseek 70b hosted on 4XL40s using Tensorfuse, you can get upto a maximum of 453 tps with 64 concurrent users. On the lower side, for 4 concurrent requests, you can get ~55tps. For a single request, it's around 20tps
Checkout the table below
@agam30 This is helpful. Thanks for sharing this