Aymeric Roucher's picture

Aymeric Roucher

m-ric

·

http://aymeric-roucher.github.io

AI & ML interests

Leading Agents at Hugging Face 🤗

Recent Activity

commented a paper about 12 hours ago

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

posted an update about 12 hours ago

𝗢𝗦-𝗚𝗲𝗻𝗲𝘀𝗶𝘀: 𝗻𝗲𝘄 𝗿𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗽𝗮𝗽𝗲𝗿 𝗽𝗿𝗼𝗽𝗼𝘀𝗲𝘀 𝗮 𝗻𝗼𝘃𝗲𝗹 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗱𝗮𝘁𝗮 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻 𝗺𝗲𝘁𝗵𝗼𝗱 𝗳𝗼𝗿 𝗖𝗹𝗮𝘂𝗱𝗲-𝗖𝗼𝗺𝗽𝘂𝘁𝗲𝗿-𝗨𝘀𝗲-𝗹𝗶𝗸𝗲 𝗮𝗴𝗲𝗻𝘁𝘀, 𝘄𝗶𝘁𝗵 𝗶𝗺𝗽𝗿𝗲𝘀𝘀𝗶𝘃𝗲 𝗿𝗲𝘀𝘂𝗹𝘁𝘀! 🔥 The main bottleneck in building GUI agents it to find training data. GUI Agent trajectories are not easy to get by. Crowdsourcing trajectories, then manually annotating them, could be an option, but at scale, it's hard to do You could use synthetic data generation (ask 1000s small existing GUI agents to solve tasks, keep only successful runs). But then it's hard to come up with many high level-tasks. ➡️ Well, a novel technique was just published that creates a new promising paradigm for synthetic data generation: Shanghai AI Lab researchers propose OS-Genesis, a novel way to create training data for GUI agents that flips the traditional approach on its head. Instead of starting with predefined tasks and having humans or machines execute them, OS-Genesis first explores the interface naturally, then derives meaningful tasks from those interactions. 🔍 Exploration-driven vs task-driven approach: ‣ Instead of starting with tasks, OS-Genesis first explores GUIs by clicking and interacting ‣ It then reverse-engineers high-level tasks from successful interaction patterns ‣ This leads to more natural and diverse training data than predefined tasks 🎯 Novel reward model for trajectory quality: ‣ Rather than discarding incomplete trajectories, OS-Genesis scores them based on coherence and completion ‣ This preserves valuable partial successes that would otherwise be wasted 🏆 Superior results across environments: ‣ Nearly doubles performance on AndroidWorld (9.8% → 17.4%) By the way, this field of GUI agents is still in infancy, so you can still make a difference with "low-cost" setups: their paper gets SOTA results with only 8xA100! Read the paper here 👉 https://huggingface.co/papers/2412.19723

upvoted a paper about 13 hours ago

Agent Laboratory: Using LLM Agents as Research Assistants

View all activity

Articles

Introducing smolagents: simple agents that write actions in code.

Expert Support case study: Bolstering a RAG app with LLM-as-a-Judge

Our Transformers Code Agent beats the GAIA benchmark!

Extracting Concepts from LLMs: Anthropic’s recent discoveries 📖

License to Call: Introducing Transformers Agents 2.0

Open-source LLMs as LangChain Agents

Organizations

m-ric's activity

upvoted a paper about 13 hours ago

Agent Laboratory: Using LLM Agents as Research Assistants

Paper • 2501.04227 • Published 5 days ago • 67

upvoted a paper 6 days ago

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

Paper • 2412.19723 • Published 16 days ago • 78

upvoted 2 articles 6 days ago

Article

🐺🐦‍⬛ LLM Comparison/Test: DeepSeek-V3, QVQ-72B-Preview, Falcon3 10B, Llama 3.3 70B, Nemotron 70B in my updated MMLU-Pro CS benchmark

By

•

10 days ago

• 37

Article

Building Effective Agents with Anthropic’s Best Practices and smolagents ❤️

By

•

8 days ago

• 4

upvoted a paper 13 days ago

A New Approach for Explainable Multiple Organ Annotation with Few Data

Paper • 1912.12932 • Published Dec 30, 2019 • 1

upvoted an article 28 days ago

Article

🇪🇺✍️ EU AI Act: Systemic Risks in the First CoP Draft Comments ✍️🇪🇺

By

•

Dec 12, 2024

• 12

upvoted a collection 28 days ago

Diffusion Tools

4 items • Updated Apr 30, 2024 • 5

upvoted a collection 29 days ago

GUI agents

A collection of papers on GUI agents • 3 items • Updated 30 days ago • 5

upvoted a paper 30 days ago

AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials

Paper • 2412.09605 • Published Dec 12, 2024 • 27

upvoted 2 papers about 1 month ago

If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents

Paper • 2401.00812 • Published Jan 1, 2024 • 4

Code Agents are State of the Art Software Testers

Paper • 2406.12952 • Published Jun 18, 2024 • 1

upvoted a collection about 1 month ago

Awesome Computer Use Agents

https://github.com/ranpox/awesome-computer-use • 25 items • Updated 25 days ago • 7

upvoted a paper about 1 month ago

Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

Paper • 2412.04454 • Published Dec 5, 2024 • 59

upvoted 2 articles about 1 month ago

Article

🐺🐦‍⬛ LLM Comparison/Test: 25 SOTA LLMs (including QwQ) through 59 MMLU-Pro CS benchmark runs

By

•

Dec 4, 2024

• 76

Article

They Said It Couldn’t Be Done

By

•

Dec 5, 2024

• 76

upvoted 2 papers about 1 month ago

Agent Workflow Memory

Paper • 2409.07429 • Published Sep 11, 2024 • 29

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Paper • 2411.17465 • Published Nov 26, 2024 • 78

upvoted an article about 1 month ago

Article

EuroLLM-9B

By

•

Dec 2, 2024

• 105

upvoted a paper about 1 month ago

DynaSaur: Large Language Agents Beyond Predefined Actions

Paper • 2411.01747 • Published Nov 4, 2024 • 21

upvoted an article about 2 months ago

Article

Introducing Observers: AI Observability with Hugging Face datasets through a lightweight SDK

By

•

Nov 21, 2024

• 35