Aymeric Roucher's picture

Aymeric Roucher

m-ric

AI & ML interests

Leading Agents at Hugging Face ๐Ÿค—

Recent Activity

posted an update about 12 hours ago
๐—ข๐—ฆ-๐—š๐—ฒ๐—ป๐—ฒ๐˜€๐—ถ๐˜€: ๐—ป๐—ฒ๐˜„ ๐—ฟ๐—ฒ๐˜€๐—ฒ๐—ฎ๐—ฟ๐—ฐ๐—ต ๐—ฝ๐—ฎ๐—ฝ๐—ฒ๐—ฟ ๐—ฝ๐—ฟ๐—ผ๐—ฝ๐—ผ๐˜€๐—ฒ๐˜€ ๐—ฎ ๐—ป๐—ผ๐˜ƒ๐—ฒ๐—น ๐˜๐—ฟ๐—ฎ๐—ถ๐—ป๐—ถ๐—ป๐—ด ๐—ฑ๐—ฎ๐˜๐—ฎ ๐—ด๐—ฒ๐—ป๐—ฒ๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐—บ๐—ฒ๐˜๐—ต๐—ผ๐—ฑ ๐—ณ๐—ผ๐—ฟ ๐—–๐—น๐—ฎ๐˜‚๐—ฑ๐—ฒ-๐—–๐—ผ๐—บ๐—ฝ๐˜‚๐˜๐—ฒ๐—ฟ-๐—จ๐˜€๐—ฒ-๐—น๐—ถ๐—ธ๐—ฒ ๐—ฎ๐—ด๐—ฒ๐—ป๐˜๐˜€, ๐˜„๐—ถ๐˜๐—ต ๐—ถ๐—บ๐—ฝ๐—ฟ๐—ฒ๐˜€๐˜€๐—ถ๐˜ƒ๐—ฒ ๐—ฟ๐—ฒ๐˜€๐˜‚๐—น๐˜๐˜€! ๐Ÿ”ฅ The main bottleneck in building GUI agents it to find training data. GUI Agent trajectories are not easy to get by. Crowdsourcing trajectories, then manually annotating them, could be an option, but at scale, it's hard to do You could use synthetic data generation (ask 1000s small existing GUI agents to solve tasks, keep only successful runs). But then it's hard to come up with many high level-tasks. โžก๏ธ Well, a novel technique was just published that creates a new promising paradigm for synthetic data generation: Shanghai AI Lab researchers propose OS-Genesis, a novel way to create training data for GUI agents that flips the traditional approach on its head. Instead of starting with predefined tasks and having humans or machines execute them, OS-Genesis first explores the interface naturally, then derives meaningful tasks from those interactions. ๐Ÿ” Exploration-driven vs task-driven approach: โ€ฃ Instead of starting with tasks, OS-Genesis first explores GUIs by clicking and interacting โ€ฃ It then reverse-engineers high-level tasks from successful interaction patterns โ€ฃ This leads to more natural and diverse training data than predefined tasks ๐ŸŽฏ Novel reward model for trajectory quality: โ€ฃ Rather than discarding incomplete trajectories, OS-Genesis scores them based on coherence and completion โ€ฃ This preserves valuable partial successes that would otherwise be wasted ๐Ÿ† Superior results across environments: โ€ฃ Nearly doubles performance on AndroidWorld (9.8% โ†’ 17.4%) By the way, this field of GUI agents is still in infancy, so you can still make a difference with "low-cost" setups: their paper gets SOTA results with only 8xA100! Read the paper here ๐Ÿ‘‰ https://huggingface.co/papers/2412.19723
View all activity

Articles

Organizations

Hugging Face's profile picture Atmos Bank's profile picture Hugging Test Lab's profile picture Tools's profile picture HuggingFaceM4's profile picture lecocqassociate's profile picture huggingPartyParis's profile picture Supreme's profile picture FactSet's profile picture Propulse Lab's profile picture Leaderboard Organization's profile picture FactSet's profile picture CGIAR's profile picture Aperture Laboratories's profile picture AI Energy Score Project's profile picture C&A's profile picture Social Post Explorers's profile picture Dev Mode Explorers's profile picture Agent Collab's profile picture SLLHF's profile picture Data Agents's profile picture Hugging Face Party @ PyTorch Conference's profile picture Nerdy Face's profile picture Hugging Face Science's profile picture Agents Leaderboard's profile picture

m-ric's activity

upvoted 2 articles 6 days ago
view article
Article

๐Ÿบ๐Ÿฆโ€โฌ› LLM Comparison/Test: DeepSeek-V3, QVQ-72B-Preview, Falcon3 10B, Llama 3.3 70B, Nemotron 70B in my updated MMLU-Pro CS benchmark

By wolfram โ€ข
โ€ข 37
view article
Article

Building Effective Agents with Anthropicโ€™s Best Practices and smolagents โค๏ธ

By Sri-Vigneshwar-DJ โ€ข
โ€ข 4
upvoted an article 28 days ago
view article
Article

๐Ÿ‡ช๐Ÿ‡บโœ๏ธ EU AI Act: Systemic Risks in the First CoP Draft Comments โœ๏ธ๐Ÿ‡ช๐Ÿ‡บ

By yjernite โ€ข
โ€ข 12
upvoted 2 articles about 1 month ago
view article
Article

๐Ÿบ๐Ÿฆโ€โฌ› LLM Comparison/Test: 25 SOTA LLMs (including QwQ) through 59 MMLU-Pro CS benchmark runs

By wolfram โ€ข
โ€ข 76
upvoted an article about 1 month ago
upvoted an article about 2 months ago
view article
Article

Introducing Observers: AI Observability with Hugging Face datasets through a lightweight SDK

โ€ข 35