Researchers have unveiled VLGA, the first vision-language-action model designed to accurately reconstruct the dense 3D environment for autonomous driving. Unlike previous models that either relied on static 3D foundations or sparse geometric constraints, VLGA integrates geometry as a fourth modality alongside vision, language, and action. This integration is supervised through a per-pixel pointmap regression loss against LiDAR data. Extensive testing on the nuScenes and Bench2Drive datasets demonstrates VLGA's superior performance, achieving a new state of the art on open-loop evaluations with the lowest L2 average error of 0.50 meters and a 3-second collision rate of 0.18%. In closed-loop assessments, VLGA achieved a driving score of 79.08, surpassing the previous best by 0.71 while maintaining comparable efficiency and comfort.
Introducing VLGA: A Groundbreaking Vision-Language-Geometry-Action Model for Autonomous Driving
More Articles From This Day
OpenAI Announces Acquisition of Ona to Enhance Codex with Secure Cloud Environments
OpenAI has announced its intention to acquire Ona, aiming to enhance its Codex platform by integrating secure, persistent cloud environments. This acquisition is set to facilitate the development of long-running AI agents that can operate seamlessly across various enterprise workflows, marking a significant step in OpenAI's strategy to advance its AI capabilities in business applications.
