OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis Paper โข 2412.19723 โข Published 16 days ago โข 78 โข 3
ShowUI: One Vision-Language-Action Model for GUI Visual Agent Paper โข 2411.17465 โข Published Nov 26, 2024 โข 78 โข 3
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents Paper โข 2410.24024 โข Published Oct 31, 2024 โข 48 โข 3
CLEAR: Character Unlearning in Textual and Visual Modalities Paper โข 2410.18057 โข Published Oct 23, 2024 โข 200 โข 4
Aria: An Open Multimodal Native Mixture-of-Experts Model Paper โข 2410.05993 โข Published Oct 8, 2024 โข 108 โข 7
Addition is All You Need for Energy-efficient Language Models Paper โข 2410.00907 โข Published Oct 1, 2024 โข 145 โข 17
Emu3: Next-Token Prediction is All You Need Paper โข 2409.18869 โข Published Sep 27, 2024 โข 94 โข 9
Prithvi WxC: Foundation Model for Weather and Climate Paper โข 2409.13598 โข Published Sep 20, 2024 โข 40 โข 4
YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models Paper โข 2409.13592 โข Published Sep 20, 2024 โข 49 โข 9