similar to OmniParser

#6
by yangzehan - opened

Does it mean that you can operate a computer, similar to OmniParser

Microsoft org

It uses OmniParser to generate set-of-mark prompting, and further decide which button to click. Basically you can imagine this more is a mini GPT-4O that can understand the SoM prompts well and know how to take actions.

I understand, so we can add support for this model in the demo of OmniParser

Microsoft org

yes, you are correct, please see this demo: https://huggingface.co/spaces/microsoft/Magma-UI.

We already integrated it for you, :)

Sign up or log in to comment