Thoughts on deepseek-r1. Correct me if I'm wrong

#69
by pkms - opened

Both reputable and clickbait-driven news outlets are missing key technical details regarding deepseek-r1

Sensationalist news will capitalize on this, enhancing emotional responses and market volatility.

It's built on top of deepseek-V3, a 671B parameter model. GPT-3 is 175B, GPT-4 is 1.76B parameters.

To train an LLM is different than to use an LLM. Training 671B still requires datacenter grade compute.

Using an LLM can be via web (most people) or running in your own machine (physical or cloud, constraint is usually how much ram the system has).

Via web, backend compute required for uptime and low latency can be greatly reduced.

In your own machine, either cloud (eg.: hosting your own LLM for your company) or physical (eg.: 0 cost code assistant AI running locally), allow for reduced costs of cloud compute, and possibility to run better-performing/smarter LLM in the same limited amount of ram.

Only model weights and model's white paper were published. Training code, architecture, and datasets was not, therefore it shouldn't be called open source.

If (and most likely when) reproduced successfully, deepseek-r1 will allow for LLM implementation to become even more ubiquitous due to freed up compute that will allow for more processing to be done.

Deepseek is not going to crash global demand for AI chips. It was perhaps more of a well timed ice bath to US recent announcements involving AI, and there is an argument being pushed that deepseek-r1 results were falsified and that it was actually trained in US Chips in breach of the US's embargo on exportation of top-grade ai chips.

Update note -
the github repo shared in the link below is actually an opensource / contributor driven effort to reconstruct a trainer codebase that can be used to train using the deepseek model architecture. So not code from deepseek founding team itself.

Deepseek R1 - if the code available within the repo is assumed to be definitive and complete model code , its quite a simple iteration on Deepseek V2. But confirming that the code is complete is going to be pain. From the the way the repo is prepared its my assumption that the team behind intends to prevent complete replication from scratch and rather have you dependent on their final model files and may support fine tuning using additional data.

Early observations ---
It seems code for deepseek-R1 is indeep open source though some of it is hidden within those 200+ model files within huggingface repo i.e. this one.

rest code seems to be here https://github.com/huggingface/open-r1 . I am still trying to wrap my head around these two topics.

Not sure why or what for such obscured presentation of a opensource repo. If you wish to train your own from scratch you will need to aggregate your own data which makes sense considering all the copyright issues that might follow.

since the core codebasse seems to be actually open sourced between these two repos it might make sense to fine tune the model file in absence or large scale compute rather than train one from scratch.

Sign up or log in to comment