Researchers have developed ToolCUA, an end-to-end agent aimed at optimizing the selection of GUI-Tool paths for Computer Use Agents (CUAs). The study addresses challenges in hybrid action spaces where CUAs must decide between GUI actions and tool calls. The authors introduce a new Interleaved GUI-Tool Trajectory Scaling Pipeline, which utilizes existing static GUI trajectories to create diverse trajectories without manual intervention. The agent employs a combination of warmup supervised fine-tuning and reinforcement learning to enhance decision-making during critical transitions. Experiments conducted on OSWorld-MCP demonstrate that ToolCUA achieves 46.85% accuracy, marking a 66% improvement over previous models, and shows better performance compared to GUI-only methods. This work establishes a new benchmark in the field, suggesting that training in hybrid action spaces can benefit real-world digital agents.
ToolCUA: A Novel Approach to Optimal GUI-Tool Path Orchestration for Computer Use Agents
More Articles From This Day
Google DeepMind Aims for Competitive Edge Against OpenAI and Anthropic
Google and its AI research lab DeepMind are intensifying efforts to compete with OpenAI and Anthropic in the artificial intelligence landscape. The initiative indicates a strategic move by Google to reclaim prominence in the rapidly evolving AI sector, focusing on advancing its capabilities and offerings to challenge leading players in the field.
