view post Post 617 Gemma 3 seems to be really good at human preference. Just waiting for ppl to see it. See translation 🔥 2 2 + Reply
Ideas in Inference-time Scaling can Benefit Generative Pre-training Algorithms Paper • 2503.07154 • Published 3 days ago • 2
view post Post 2420 @nroggendorff is that you sama? See translation 2 replies · 😎 2 2 🔥 1 1 🤝 1 1 + Reply
view post Post 1614 R1 is out! And with a lot of other R1 releated models... See translation 🚀 5 5 + Reply
The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization Paper • 2403.17031 • Published Mar 24, 2024 • 6
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models Paper • 2410.18252 • Published Oct 23, 2024 • 7
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training Paper • 2411.15124 • Published Nov 22, 2024 • 59
LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps Paper • 2412.15035 • Published Dec 19, 2024 • 4
view post Post 13176 Google drops Gemini 2.0 Flash Thinkinga new experimental model that unlocks stronger reasoning capabilities and shows its thoughts. The model plans (with thoughts visible), can solve complex problems with Flash speeds, and morenow available in anychat, try it out: akhaliq/anychat See translation 3 replies · 🚀 9 9 🔥 5 5 👀 2 2 👍 2 2 + Reply
view post Post 463 @s3nh Hey man check your discord! Got some news. See translation 4 replies · 👍 1 1 + Reply
Think Beyond Size: Adaptive Prompting for More Effective Reasoning Paper • 2410.08130 • Published Oct 10, 2024 • 2
view post Post 13687 QwQ-32B-Preview is now available in anychatA reasoning model that is competitive with OpenAI o1-mini and o1-previewtry it out: akhaliq/anychat See translation 1 reply · ❤️ 3 3 👀 2 2 + Reply
view post Post 4227 New model drop in anychatallenai/Llama-3.1-Tulu-3-8B is now availabletry it here: akhaliq/anychat See translation 🔥 3 3 👍 1 1 + Reply