Julien Chaumond's picture

Julien Chaumond PRO

julien-c

AI & ML interests

<3 ML/AI for everyone, building products to propel communities fwd

Recent Activity

Articles

Organizations

Hugging Face's profile picture Safetensors's profile picture Notebooks-explorers's profile picture Nbconvert-internal's profile picture BigScience Workshop's profile picture Spaces-explorers's profile picture Flax Community's profile picture Templates's profile picture Hugging Face Course's profile picture Giskard's profile picture ph-snps's profile picture Text Generation Inference's profile picture Amazon SageMaker Community's profile picture Training Transformers Together's profile picture Hugging Chat's profile picture Atmos Bank's profile picture Godot Engine Demos's profile picture Pyodide Demos's profile picture Huggingface.js's profile picture Webhooks Explorers (BETA)'s profile picture Workshop June 13 Classroom's profile picture HF Canonical Model Maintainers's profile picture TRL's profile picture Open-Source AI Meetup's profile picture Scanned Tokens's profile picture HF Legal's profile picture Language Tools's profile picture Stable Diffusion concepts library's profile picture Teven-projects's profile picture Banana-projects's profile picture Exbert-project's profile picture Blog-explorers's profile picture EU org's profile picture Hacktoberfest 2023's profile picture huggingPartyParis's profile picture Enterprise Explorers's profile picture ZeroGPU Explorers's profile picture OpenAI community's profile picture XLNet community's profile picture ALBERT community's profile picture Transformer-XL community's profile picture Facebook AI community's profile picture DistilBERT community's profile picture BERT community's profile picture T5 community's profile picture choosealicense.com mirror's profile picture Social Post Explorers's profile picture Dev Mode Explorers's profile picture Test's profile picture private beta for deeplinks's profile picture Paris AI Running Club's profile picture kmhf's profile picture Hugging Face Party @ PyTorch Conference's profile picture Nerdy Face's profile picture Hugging Face Science's profile picture open/ acc's profile picture DDUF's profile picture Self-serve FTW's profile picture

julien-c's activity

replied to Nitral-AI's post 2 days ago
view reply

FWIW their PM (Chris Perry) is quite helpful on twitter. Maybe try to ping him?

reacted to Nitral-AI's post with 😔 2 days ago
view post
Post
3124
That moment when you spend 5 days up babysitting trains, only for colab pro + to randomly disconnect the environment at every chance with 0 error indication of any kind (it just disconnects without an error). Nuke the session from the interface, but continue to eat my colab credits while it reports to wandb. 0 way of saving the models when this happens since it nukes the code preset up to auto-execute. And since the sessions 'exist' but also at the same time doesn't exist i cant close it. And have to wait till they auto timeout after 24hrs. Guess, i won't be using colab for 'quick' test trains anymore. Thanks google for scheming the very little model training budget i had for the month.
·
replied to burtenshaw's post 24 days ago
reacted to burtenshaw's post with 🤗❤️ 24 days ago
view post
Post
2663
People are flexing their end of year stats, so I made this app to show hub stats in a tidy design!

Thanks @Ameeeee and @jfcalvo for the feature from Argilla!
burtenshaw/recap
  • 1 reply
·
replied to victor's post 26 days ago
reacted to Kseniase's post with 🔥 28 days ago
view post
Post
2829
TL;DR: The Story of Attention's Development by @karpathy

Origin: First proposed in 2014 by @Dzmitry Bahdanau, @KyunghyunCho , and Yoshua Bengio in Neural Machine Translation by Jointly Learning to Align and Translate (1409.0473) . Inspired by cognitive processes and later renamed from "RNNSearch."

Key Idea: A data-dependent weighted average for pooling and communication, enabling flexible and powerful neural network connections.

Breakthrough: Bahdanau's "soft search" mechanism (softmax + weighted averaging) solved encoder-decoder bottlenecks in machine translation.
Transformer Revolution: Attention Is All You Need (1706.03762) (2017) by @ashishvaswanigoogle et al. simplified architectures by stacking attention layers, introducing multi-headed attention and positional encodings.
Legacy: Attention replaced RNNs, driving modern AI systems like ChatGPT. It emerged independently but was influenced by contemporaneous work like Alex Graves’s Neural Turing Machines (1410.5401) and Jason Weston’s Memory Networks (1410.3916) .

Attention to history: Jürgen Schmidhuber claims his 1992 Fast Weight Programmers anticipated modern attention mechanisms. While conceptually similar, the term “attention” was absent, and there’s no evidence it influenced Bahdanau, Cho, and Bengio’s 2014 work. Paying attention (!) to history might have brought us to genAI earlier – but credit for the breakthrough still goes to Montreal.

Referenced Papers:
Attention Origin: Neural Machine Translation by Jointly Learning to Align and Translate (1409.0473)
Transformers: Attention Is All You Need (1706.03762)
Alex Graves' Work: Neural Turing Machines (1410.5401), Generating Sequences With Recurrent Neural Networks (1308.0850)
Jason Weston @spermwhale 's Memory Networks (1410.3916)
Sequence to Sequence Learning with Neural Networks (1409.3215) by Ilya Sutskever ( @ilyasut ), Oriol Vinyals, Quoc V. Le

Who else deserves recognition in this groundbreaking narrative of innovation? Let’s ensure every contributor gets the credit they deserve. Leave a comment below 👇🏻🤗
·
replied to Duskfallcrew's post about 1 month ago
view reply

Public storage- y'all ... HF are you nuts?

i can neither confirm nor deny

reacted to FranckAbgrall's post with 👍 about 1 month ago
view post
Post
1994
Hey!

✨ If you're using HF access tokens, we just released an overview of the permissions for fine-grained tokens by hovering over the badge on token settings page (org and user)

It will show the highest permission you've set for each entity 👀
reacted to their post with 😎🤝👍🤗❤️🔥 about 1 month ago
view post
Post
8237
After some heated discussion 🔥, we clarify our intent re. storage limits on the Hub

TL;DR:
- public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible
- private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)

docs: https://huggingface.co/docs/hub/storage-limits

We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community 🔥

cc: @reach-vb @pierric @victor and the HF team
·
reacted to burtenshaw's post with 🔥 about 1 month ago
view post
Post
2426
Quick update from week 1 of smol course. The community is taking the driving seat and using the material for their own projects. If you want to do the same, join in!

- we have ongoing translation projects in Korean, Vietnamese, Portuguese, and Spanish
- 3 chapters are ready for students. On topics like, instruction tuning, preference alignment, and parameter efficient fine tuning
- 3 chapters are in progress on evaluation, vision language models, and synthetic data.
- around 780 people have forked the repo to use it for learning, teaching, sharing.

⏭️ Next step is to support people that want to use the course for teaching, content creation, internal knowledge sharing, or anything. If you're into this. Drop an issue or PR

REPO: https://buff.ly/3ZCMKX2
discord channel: https://buff.ly/4f9F8jA
reacted to bartowski's post with 👀 about 1 month ago
view post
Post
15944
Looks like Q4_0_N_M file types are going away

Before you panic, there's a new "preferred" method which is online (I prefer the term on-the-fly) repacking, so if you download Q4_0 and your setup can benefit from repacking the weights into interleaved rows (what Q4_0_4_4 was doing), it will do that automatically and give you similar performance (minor losses I think due to using intrinsics instead of assembly, but intrinsics are more maintainable)

You can see the reference PR here:

https://github.com/ggerganov/llama.cpp/pull/10446

So if you update your llama.cpp past that point, you won't be able to run Q4_0_4_4 (unless they add backwards compatibility back), but Q4_0 should be the same speeds (though it may currently be bugged on some platforms)

As such, I'll stop making those newer model formats soon, probably end of this week unless something changes, but you should be safe to download and Q4_0 quants and use those !

Also IQ4_NL supports repacking though not in as many shapes yet, but should get a respectable speed up on ARM chips, PR for that can be found here: https://github.com/ggerganov/llama.cpp/pull/10541

Remember, these are not meant for Apple silicon since those use the GPU and don't benefit from the repacking of weights
·
posted an update about 1 month ago
view post
Post
8237
After some heated discussion 🔥, we clarify our intent re. storage limits on the Hub

TL;DR:
- public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible
- private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)

docs: https://huggingface.co/docs/hub/storage-limits

We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community 🔥

cc: @reach-vb @pierric @victor and the HF team
·