GUI grounding, which maps natural-language instructions to actionable UI elements, is a core capability of GUI agents. Prior works largely treats instructions as a static proxy for user intent, ...
Distributed Rollouts: Scalable task execution across parallel OSWorld environments with docker. Multi-modal Input Support: Processes long histories (15 steps) of screenshots + actions in an end-to-end ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results