Aymeric Roucher's picture

Aymeric Roucher

m-ric

·

http://aymeric-roucher.github.io

AI & ML interests

Leading Agents at Hugging Face 🤗

Recent Activity

commented a paper about 11 hours ago

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

posted an update about 11 hours ago

𝗢𝗦-𝗚𝗲𝗻𝗲𝘀𝗶𝘀: 𝗻𝗲𝘄 𝗿𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗽𝗮𝗽𝗲𝗿 𝗽𝗿𝗼𝗽𝗼𝘀𝗲𝘀 𝗮 𝗻𝗼𝘃𝗲𝗹 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗱𝗮𝘁𝗮 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻 𝗺𝗲𝘁𝗵𝗼𝗱 𝗳𝗼𝗿 𝗖𝗹𝗮𝘂𝗱𝗲-𝗖𝗼𝗺𝗽𝘂𝘁𝗲𝗿-𝗨𝘀𝗲-𝗹𝗶𝗸𝗲 𝗮𝗴𝗲𝗻𝘁𝘀, 𝘄𝗶𝘁𝗵 𝗶𝗺𝗽𝗿𝗲𝘀𝘀𝗶𝘃𝗲 𝗿𝗲𝘀𝘂𝗹𝘁𝘀! 🔥 The main bottleneck in building GUI agents it to find training data. GUI Agent trajectories are not easy to get by. Crowdsourcing trajectories, then manually annotating them, could be an option, but at scale, it's hard to do You could use synthetic data generation (ask 1000s small existing GUI agents to solve tasks, keep only successful runs). But then it's hard to come up with many high level-tasks. ➡️ Well, a novel technique was just published that creates a new promising paradigm for synthetic data generation: Shanghai AI Lab researchers propose OS-Genesis, a novel way to create training data for GUI agents that flips the traditional approach on its head. Instead of starting with predefined tasks and having humans or machines execute them, OS-Genesis first explores the interface naturally, then derives meaningful tasks from those interactions. 🔍 Exploration-driven vs task-driven approach: ‣ Instead of starting with tasks, OS-Genesis first explores GUIs by clicking and interacting ‣ It then reverse-engineers high-level tasks from successful interaction patterns ‣ This leads to more natural and diverse training data than predefined tasks 🎯 Novel reward model for trajectory quality: ‣ Rather than discarding incomplete trajectories, OS-Genesis scores them based on coherence and completion ‣ This preserves valuable partial successes that would otherwise be wasted 🏆 Superior results across environments: ‣ Nearly doubles performance on AndroidWorld (9.8% → 17.4%) By the way, this field of GUI agents is still in infancy, so you can still make a difference with "low-cost" setups: their paper gets SOTA results with only 8xA100! Read the paper here 👉 https://huggingface.co/papers/2412.19723

upvoted a paper about 13 hours ago

Agent Laboratory: Using LLM Agents as Research Assistants

View all activity

Articles

Introducing smolagents: simple agents that write actions in code.

Expert Support case study: Bolstering a RAG app with LLM-as-a-Judge

Our Transformers Code Agent beats the GAIA benchmark!

Extracting Concepts from LLMs: Anthropic’s recent discoveries 📖

License to Call: Introducing Transformers Agents 2.0

Open-source LLMs as LangChain Agents

Organizations

m-ric's activity

commented a paper about 11 hours ago

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

Paper • 2412.19723 • Published 16 days ago • 78 •

New activity in hf-doc-build/doc-build 13 days ago

Upload _versions.yml

#32 opened 14 days ago by

New activity in rhymes-ai/Aria about 1 month ago

Upload processor

#14 opened about 1 month ago by

Upload processor

#13 opened about 1 month ago by

New activity in rhymes-ai/Aria-torchao-int8wo about 1 month ago

Upload processor

#3 opened about 1 month ago by

Upload AriaForConditionalGeneration

#2 opened about 1 month ago by

commented a paper about 1 month ago

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Paper • 2411.17465 • Published Nov 26, 2024 • 78 •

New activity in rhymes-ai/Aria about 1 month ago

Upload processor

#12 opened about 1 month ago by

Upload AriaForConditionalGeneration

#11 opened about 1 month ago by

New activity in AtlaAI/judge-arena about 2 months ago

What are Meta-Llama-3.1-Instruct "Turbo" models?

#4 opened about 2 months ago by

commented a paper 2 months ago

AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents

Paper • 2410.24024 • Published Oct 31, 2024 • 48 •

New activity in xinsir/controlnet-union-sdxl-1.0 2 months ago

is there a plan to support FLUX.1-schnell?

#28 opened 5 months ago by

commented a paper 2 months ago

CLEAR: Character Unlearning in Textual and Visual Modalities

Paper • 2410.18057 • Published Oct 23, 2024 • 200 •

commented 5 papers 3 months ago

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 169 •

Aria: An Open Multimodal Native Mixture-of-Experts Model

Paper • 2410.05993 • Published Oct 8, 2024 • 108 •

Addition is All You Need for Energy-efficient Language Models

Paper • 2410.00907 • Published Oct 1, 2024 • 145 •

Were RNNs All We Needed?

Paper • 2410.01201 • Published Oct 2, 2024 • 51 •

Emu3: Next-Token Prediction is All You Need

Paper • 2409.18869 • Published Sep 27, 2024 • 94 •

commented 2 papers 4 months ago

Prithvi WxC: Foundation Model for Weather and Climate

Paper • 2409.13598 • Published Sep 20, 2024 • 40 •

YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models

Paper • 2409.13592 • Published Sep 20, 2024 • 49 •