We introduce Visually Grounded Active View Selection (VG-AVS) Framework, enabling embodied agents to actively adjust their viewpoint for better Visual Question Answering using only current visual cues ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results