55 282 288

Aymeric Roucher

m-ric

http://aymeric-roucher.github.io

AI & ML interests

Leading Agents at Hugging Face 🤗

Recent Activity

commented a paper about 11 hours ago

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

posted an update about 11 hours ago

𝗢𝗦-𝗚𝗲𝗻𝗲𝘀𝗶𝘀: 𝗻𝗲𝘄 𝗿𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗽𝗮𝗽𝗲𝗿 𝗽𝗿𝗼𝗽𝗼𝘀𝗲𝘀 𝗮 𝗻𝗼𝘃𝗲𝗹 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗱𝗮𝘁𝗮 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻 𝗺𝗲𝘁𝗵𝗼𝗱 𝗳𝗼𝗿 𝗖𝗹𝗮𝘂𝗱𝗲-𝗖𝗼𝗺𝗽𝘂𝘁𝗲𝗿-𝗨𝘀𝗲-𝗹𝗶𝗸𝗲 𝗮𝗴𝗲𝗻𝘁𝘀, 𝘄𝗶𝘁𝗵 𝗶𝗺𝗽𝗿𝗲𝘀𝘀𝗶𝘃𝗲 𝗿𝗲𝘀𝘂𝗹𝘁𝘀! 🔥 The main bottleneck in building GUI agents it to find training data. GUI Agent trajectories are not easy to get by. Crowdsourcing trajectories, then manually annotating them, could be an option, but at scale, it's hard to do You could use synthetic data generation (ask 1000s small existing GUI agents to solve tasks, keep only successful runs). But then it's hard to come up with many high level-tasks. ➡️ Well, a novel technique was just published that creates a new promising paradigm for synthetic data generation: Shanghai AI Lab researchers propose OS-Genesis, a novel way to create training data for GUI agents that flips the traditional approach on its head. Instead of starting with predefined tasks and having humans or machines execute them, OS-Genesis first explores the interface naturally, then derives meaningful tasks from those interactions. 🔍 Exploration-driven vs task-driven approach: ‣ Instead of starting with tasks, OS-Genesis first explores GUIs by clicking and interacting ‣ It then reverse-engineers high-level tasks from successful interaction patterns ‣ This leads to more natural and diverse training data than predefined tasks 🎯 Novel reward model for trajectory quality: ‣ Rather than discarding incomplete trajectories, OS-Genesis scores them based on coherence and completion ‣ This preserves valuable partial successes that would otherwise be wasted 🏆 Superior results across environments: ‣ Nearly doubles performance on AndroidWorld (9.8% → 17.4%) By the way, this field of GUI agents is still in infancy, so you can still make a difference with "low-cost" setups: their paper gets SOTA results with only 8xA100! Read the paper here 👉 https://huggingface.co/papers/2412.19723

upvoted a paper about 13 hours ago

Agent Laboratory: Using LLM Agents as Research Assistants

View all activity

Articles

Organizations

m-ric's activity

commented a paper about 11 hours ago

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

Paper • 2412.19723 • Published 16 days ago • 78 •

posted an update about 11 hours ago

Post

193

𝗢𝗦-𝗚𝗲𝗻𝗲𝘀𝗶𝘀: 𝗻𝗲𝘄 𝗿𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗽𝗮𝗽𝗲𝗿 𝗽𝗿𝗼𝗽𝗼𝘀𝗲𝘀 𝗮 𝗻𝗼𝘃𝗲𝗹 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗱𝗮𝘁𝗮 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻 𝗺𝗲𝘁𝗵𝗼𝗱 𝗳𝗼𝗿 𝗖𝗹𝗮𝘂𝗱𝗲-𝗖𝗼𝗺𝗽𝘂𝘁𝗲𝗿-𝗨𝘀𝗲-𝗹𝗶𝗸𝗲 𝗮𝗴𝗲𝗻𝘁𝘀, 𝘄𝗶𝘁𝗵 𝗶𝗺𝗽𝗿𝗲𝘀𝘀𝗶𝘃𝗲 𝗿𝗲𝘀𝘂𝗹𝘁𝘀! 🔥

The main bottleneck in building GUI agents it to find training data.
GUI Agent trajectories are not easy to get by. Crowdsourcing trajectories, then manually annotating them, could be an option, but at scale, it's hard to do

You could use synthetic data generation (ask 1000s small existing GUI agents to solve tasks, keep only successful runs). But then it's hard to come up with many high level-tasks.

➡️ Well, a novel technique was just published that creates a new promising paradigm for synthetic data generation: Shanghai AI Lab researchers propose OS-Genesis, a novel way to create training data for GUI agents that flips the traditional approach on its head. Instead of starting with predefined tasks and having humans or machines execute them, OS-Genesis first explores the interface naturally, then derives meaningful tasks from those interactions.

🔍 Exploration-driven vs task-driven approach:
‣ Instead of starting with tasks, OS-Genesis first explores GUIs by clicking and interacting
‣ It then reverse-engineers high-level tasks from successful interaction patterns
‣ This leads to more natural and diverse training data than predefined tasks

🎯 Novel reward model for trajectory quality:
‣ Rather than discarding incomplete trajectories, OS-Genesis scores them based on coherence and completion
‣ This preserves valuable partial successes that would otherwise be wasted

🏆 Superior results across environments:
‣ Nearly doubles performance on AndroidWorld (9.8% → 17.4%)

By the way, this field of GUI agents is still in infancy, so you can still make a difference with "low-cost" setups: their paper gets SOTA results with only 8xA100!

Read the paper here 👉 OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis (2412.19723)

upvoted a paper about 13 hours ago

Agent Laboratory: Using LLM Agents as Research Assistants

Paper • 2501.04227 • Published 5 days ago • 67

updated a Space 2 days ago

Running

📈

Get Travel Duration Tool

updated a Space 3 days ago

Running

🏢

Hf Model Downloads

liked a Space 4 days ago

Running

📚

Benchmark Data Contamination

Showing models are contaminated by trusted benchmark data

updated 2 datasets 4 days ago

m-ric/smol_agents_benchmark

Viewer • Updated 4 days ago • 132 • 31

m-ric/agents_medium_benchmark_3

Viewer • Updated 4 days ago • 132 • 9

liked a model 5 days ago

xlangai/Aguvis-7B-720P

Updated 5 days ago • 63 • 3

posted an update 6 days ago

Post

4800

Since I published it on GitHub a few days ago,
Hugging Face's new agentic library 𝘀𝗺𝗼𝗹𝗮𝗴𝗲𝗻𝘁𝘀 has gathered nearly 4k stars 🤯

➡️ But we are just getting started on agents: so we are hiring an ML Engineer to join me and double down on this effort!

The plan is to build GUI agents: agents that can act on your computer with mouse & keyboard, like Claude Computer Use.

We will make it work better, and fully open. ✨

Sounds like something you'd like to do? Apply here 👉 https://apply.workable.com/huggingface/j/AF1D4E3FEB/