If you haven't seen yet, we just released Inference Providers ๐
> 4 new serverless inference providers on the Hub ๐คฏ > Use your HF API key or personal key with all providers ๐ > Chat with Deepseek R1, V3, and more on HF Hub ๐ > We support Sambanova, TogetherAI, Replicate, and Fal.ai ๐ช
Best of all, we don't charge any markup on top of the provider ๐ซฐ Have you tried it out yet? HF Pro accounts get $2 of free usage for the provider inference.
There's so much you could do with these developments. Especially combining them together into agentic applications or fine-tuning them on your use case.
We are reproducing the full DeepSeek R1 data and training pipeline so everybody can use their recipe. Instead of doing it in secret we can do it together in the open!
๐งช Step 1: replicate the R1-Distill models by distilling a high-quality reasoning corpus from DeepSeek-R1.
๐ง Step 2: replicate the pure RL pipeline that DeepSeek used to create R1-Zero. This will involve curating new, large-scale datasets for math, reasoning, and code.
๐ฅ Step 3: show we can go from base model -> SFT -> RL via multi-stage training.
I'm helping out on some community research to learn about the AI community. If you want to join in the conversation, head over here where I started a community discussion on the most influential model since BERT.
๐ฃ Teachers and Students! Here's a handy quiz app if you're preparing your own study material.
TLDR, It's a quiz that uses a dataset to make questions and save answers
Here's how it works:
- make a dataset of multiple choice questions - duplicate the space add set the dataset repo - log in and do the quiz - submit the questions to create a new dataset
I made this to get ready for the agents course, but I hope it's useful for you projects too!
Weโre thrilled to share ๐ฆ๐บ๐ผ๐น๐ฉ๐๐ (256M & 500M)โthe smallest Visual Language Models ever built. Think: running on <1GB of GPU memoryโyou can fine-tune it on your laptop and run it on your toaster!
Why Itโs Game-Changing: - ๐ข๐๐๐ฝ๐ฒ๐ฟ๐ณ๐ผ๐ฟ๐บ๐ ๐๐ฎ๐ฟ๐ด๐ฒ๐ฟ ๐ ๐ผ๐ฑ๐ฒ๐น๐: Even the 256M model surpasses our SOTA 80B-parameter model from just 17 months ago. Over 300x reduction! ๐ ๐ถ๐ด๐ต๐๐ ๐๐ณ๐ณ๐ถ๐ฐ๐ถ๐ฒ๐ป๐ฐ๐: The 256M version delivers 80% of our 2.2B modelโs performance, and the 500M version hits 90% ๐๐ถ๐ด๐ต๐๐ป๐ถ๐ป๐ด-๐๐ฎ๐๐ ๐ฆ๐ฒ๐ฎ๐ฟ๐ฐ๐ต: SmolVLM integrates with ColiPali for state-of-the-art retrieval speedsโon par with models 10x bigger. That means cheaper, faster indexing and real-world impact.
Whatโs New Under the Hood: - ๐ก๐ฒ๐ ๐ฉ๐ถ๐๐ถ๐ผ๐ป ๐๐ป๐ฐ๐ผ๐ฑ๐ฒ๐ฟ: Smaller overall size (400M -> 93M), but with higher resolution. - ๐๐ถ๐ด๐ต๐ฒ๐ฟ ๐ฃ๐ถ๐ ๐ฒ๐น๐/๐ง๐ผ๐ธ๐ฒ๐ป: 4096 vs. 1820โmore efficient image processing. - ๐ฆ๐บ๐ฎ๐ฟ๐ ๐ง๐ผ๐ธ๐ฒ๐ป๐ถ๐๐ฎ๐๐ถ๐ผ๐ป: Faster training and a performance boost.