Data Is Better Together Contributor

community
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

data-is-better-together-contributor's activity

fdaudens 
posted an update about 2 hours ago
view post
Post
265
Yes, DeepSeek R1's release is impressive. But the real story is what happened in just 7 days after:

- Original release: 8 models, 540K downloads. Just the beginning...

- The community turned those open-weight models into +550 NEW models on Hugging Face. Total downloads? 2.5M—nearly 5X the originals.

The reason? DeepSeek models are open-weight, letting anyone build on top of them. Interesting to note that the community focused on quantized versions for better efficiency & accessibility. They want models that use less memory, run faster, and are more energy-efficient.

When you empower builders, innovation explodes. For everyone. 🚀

The most popular community model? @bartowski 's DeepSeek-R1-Distill-Qwen-32B-GGUF version — 1M downloads alone.
sayakpaul 
posted an update about 4 hours ago
view post
Post
320
We have authored a post to go over the state of video generation in the Diffusers ecosystem 🧨

We cover the models supported, the knobs of optims our users can fire, fine-tuning, and more 🔥

5-6GBs for HunyuanVideo, sky is the limit 🌌 🤗
https://huggingface.co/blog/video_gen
davanstrien 
posted an update about 5 hours ago
view post
Post
343
🌍 Big step for multilingual AI data!

The Hugging Face community has rated educational content in languages spoken by 1.6 billion people! New additions:
• Japanese
• Italian
• Old High German

Learn more and contribute: https://huggingface.co/blog/davanstrien/fineweb2-community

These ratings can help enhance training data for major world languages.
davidberenstein1957 
posted an update about 9 hours ago
burtenshaw 
posted an update about 10 hours ago
view post
Post
516
Manic few days in open source AI, with game changing development all over the place. Here's a round up of the resources:

- The science team at @huggingface reproduced and open source the seek r1. https://github.com/huggingface/open-r1
- @qwen released a series of models with 1 million token context! https://qwenlm.github.io/blog/qwen2.5-1m/
- SmolVLM got even smaller with completely new variants at 256m and 500m https://huggingface.co/blog/smolervlm

There's so much you could do with these developments. Especially combining them together into agentic applications or fine-tuning them on your use case.
burtenshaw 
posted an update 3 days ago
view post
Post
493
Hey 👋

I'm helping out on some community research to learn about the AI community. If you want to join in the conversation, head over here where I started a community discussion on the most influential model since BERT.

OSAIResearchCommunity/README#2
burtenshaw 
posted an update 3 days ago
view post
Post
1304
📣 Teachers and Students! Here's a handy quiz app if you're preparing your own study material.

TLDR, It's a quiz that uses a dataset to make questions and save answers

Here's how it works:

- make a dataset of multiple choice questions
- duplicate the space add set the dataset repo
- log in and do the quiz
- submit the questions to create a new dataset

I made this to get ready for the agents course, but I hope it's useful for you projects too!

quiz app burtenshaw/dataset_quiz

dataset with questions burtenshaw/exam_questions

agents course we're working on https://huggingface.co/agents-course
burtenshaw 
posted an update 4 days ago
view post
Post
1959
AI was built on side projects!
AtAndDev 
posted an update 5 days ago
view post
Post
433
Deepseek gang on fire fr fr
burtenshaw 
posted an update 5 days ago
view post
Post
3370
🚧 Work in Progress! 🚧

👷‍♀️ We're working hard on getting the official agents course ready for the 50,000 students that have signed up.

If you want to contribute to the discussion, I started these community posts. Looking forward to hearing from you:

- smolagents unit in the agents course - agents-course/README#7
- LlamaIndex Unit in the agents course - agents-course/README#6
- LangChain and LangGraph unit in the agents course - agents-course/README#5
- Real world use cases in the agents course - agents-course/README#8


fdaudens 
posted an update 6 days ago
davidberenstein1957 
posted an update 6 days ago
fdaudens 
posted an update 7 days ago
view post
Post
1789
Reminder: Don’t. Use. ChatGPT. As. A. Calculator. Seriously. 🤖

Loved listening to @sasha on Hard Fork—it really made me think.

A few takeaways that hit home:
- Individual culpability only gets you so far. The real priority: demanding accountability and transparency from companies.
- Evaluate if generative AI is the right tool for certain tasks (like search) before using it.

Curious about the full conversation? https://www.nytimes.com/2025/01/17/podcasts/hardfork-tiktok-rednote-environment.html. Give it a listen—it’s worth it! 🌍
  • 1 reply
·
prithivMLmods 
posted an update 7 days ago
AtAndDev 
posted an update 7 days ago
view post
Post
1551
R1 is out! And with a lot of other R1 releated models...
ZennyKenny 
posted an update 9 days ago
view post
Post
413
Really pleased with the Bring Your Own Model (BYOM) feature in Brave Browser: https://brave.com/blog/byom-nightly/

Takes about 5 minutes to configure your own locally running LLM as an in-browser assistant. Totally local, totally private, totally yours.
  • 1 reply
·
not-lain 
posted an update 10 days ago
view post
Post
1068
we now have more than 2000 public AI models using ModelHubMixin🤗
davidberenstein1957 
posted an update 10 days ago
nataliaElv 
posted an update 10 days ago
view post
Post
1417
New chapter in the Hugging Face NLP course! 🤗 🚀

We've added a new chapter about the very basics of Argilla to the Hugging Face NLP course. Learn how to set up an Argilla instance, load & annotate datasets, and export them to the Hub. 

Any feedback for improvements welcome!

https://huggingface.co/learn/nlp-course/chapter10
ZennyKenny 
posted an update 10 days ago
view post
Post
390
On-demand audio transcription is an often-requested service without many good options on the market.

Using Hugging Face Spaces with Gradio SDK and the OpenAI Whisper model, I've put together a simple interface that supports the transcription and summarisation of audio files up to five minutes in length, completely open source and running on CPU upgrade. The cool thing is that it's built without a dedicated inference endpoint, completely on public infrastructure.

Check it out: ZennyKenny/AudioTranscribe

I wrote a short article about the backend mechanics for those who are interested: https://huggingface.co/blog/ZennyKenny/on-demand-public-transcription
  • 1 reply
·