The main bottleneck in building GUI agents it to find training data. GUI Agent trajectories are not easy to get by. Crowdsourcing trajectories, then manually annotating them, could be an option, but at scale, it's hard to do
You could use synthetic data generation (ask 1000s small existing GUI agents to solve tasks, keep only successful runs). But then it's hard to come up with many high level-tasks.
โก๏ธ Well, a novel technique was just published that creates a new promising paradigm for synthetic data generation: Shanghai AI Lab researchers propose OS-Genesis, a novel way to create training data for GUI agents that flips the traditional approach on its head. Instead of starting with predefined tasks and having humans or machines execute them, OS-Genesis first explores the interface naturally, then derives meaningful tasks from those interactions.
๐ Exploration-driven vs task-driven approach: โฃ Instead of starting with tasks, OS-Genesis first explores GUIs by clicking and interacting โฃ It then reverse-engineers high-level tasks from successful interaction patterns โฃ This leads to more natural and diverse training data than predefined tasks
๐ฏ Novel reward model for trajectory quality: โฃ Rather than discarding incomplete trajectories, OS-Genesis scores them based on coherence and completion โฃ This preserves valuable partial successes that would otherwise be wasted
๐ Superior results across environments: โฃ Nearly doubles performance on AndroidWorld (9.8% โ 17.4%)
By the way, this field of GUI agents is still in infancy, so you can still make a difference with "low-cost" setups: their paper gets SOTA results with only 8xA100!
Almost every AI researcher has studied or conducted a large number of AI research papers. So, it's quite logical that researchers are trying to create AI systems to help conduct research. Creating scientific research could be much easier and more varied if we use LLMs and AI assistants tailored for this purpose. Just imagine how interesting it would be to read high-quality research about AI made by an AI agent.
Today, we offer you to explore these 10 AI systems for scientific research:
I have an idea that I'm currently working on โ developing a standard API for large models. This standard would ensure compatibility with all known protocols, enabling large models worldwide to be accessed through a unified API. For instance, it could connect models like qwen, deepseek, and GLM from China. However, I haven't found a suitable forum for democratic discussion on this yet. I'm unsure of the next steps. ๐ฅ๐ฅ๐ฅ