Deploying production ready Deepseek R1 on your AWS with vLLM

#32

by samagra14 - opened 7 days ago

Discussion

samagra14

7 days ago

Hi People

Within 48 hours of DeepSeek-R1’s release, developers are already running the 671B reasoning beast in production

We just dropped the ultimate guide to deploy it on serverless GPUs on your own cloud: https://tensorfuse.io/docs/guides/deepseek_r1

Hope this guide helps you all experimenting with agents that require reasoning.

gane5hvarma

7 days ago

Thanks for the guide. Seems like my weekend is sorted.

Learner

3 days ago

cheers! how much would the hourly costs be for that? let's assume the 671b version

qrobin

3 days ago

Many thanks for sharing this.

gregplatflow

3 days ago

How many tokens per second do you get with this estup?

agam30

1 day ago

@gregplatflow

for deepseek 70b hosted on 4XL40s using Tensorfuse, you can get upto a maximum of 453 tps with 64 concurrent users. On the lower side, for 4 concurrent requests, you can get ~55tps. For a single request, it's around 20tps

Checkout the table below

gane5hvarma

about 10 hours ago

@agam30 This is helpful. Thanks for sharing this

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment