So πDeepSeekπ hits the mainstream media. But it has been a star in our little cult for at least 6 months. Its meteoric success is not overnight, but two years in the making.
* End of 2023, they launched the first model (pretrained by themselves) following Llama 2 architecture * June 2024, v2 (MoE architecture) surpassed Gemini 1.5, but behind Mistral * September, v2.5 surpassed GPT 4o mini * December, v3 surpassed GPT 4o * Now R1 surpassed o1
Most importantly, if you think DeepSeek success is singular and unrivaled, that's WRONG. The following models are also near or equal the o1 bar.
β¨ MIT License : enabling distillation for custom models β¨ 32B & 70B models match OpenAI o1-mini in multiple capabilities β¨ API live now! Access Chain of Thought reasoning with model='deepseek-reasoner'
reacted to prithivMLmods's
post with β€οΈπ₯7 days ago