Introducing VLGA: A Groundbreaking Vision-Language-Geometry-Action Model for Autonomous Driving

arXiv AI· Jin Yao, Dhruva Dixith Kurra, Tom Lampo et al.· Friday, June 12, 2026

Researchers have unveiled VLGA, the first vision-language-action model designed to accurately reconstruct the dense 3D environment for autonomous driving. Unlike previous models that either relied on static 3D foundations or sparse geometric constraints, VLGA integrates geometry as a fourth modality alongside vision, language, and action. This integration is supervised through a per-pixel pointmap regression loss against LiDAR data. Extensive testing on the nuScenes and Bench2Drive datasets demonstrates VLGA's superior performance, achieving a new state of the art on open-loop evaluations with the lowest L2 average error of 0.50 meters and a 3-second collision rate of 0.18%. In closed-loop assessments, VLGA achieved a driving score of 79.08, surpassing the previous best by 0.71 while maintaining comparable efficiency and comfort.

Read Full Article

View All For This Day

Introducing VLGA: A Groundbreaking Vision-Language-Geometry-Action Model for Autonomous Driving

More Articles From This Day

OpenAI Announces Acquisition of Ona to Enhance Codex with Secure Cloud Environments

Anthropic Launches Claude Corps Fellowship Program to Expand AI Benefits Nationwide

Google DeepMind Invests $10 Million to Address Risks of AI Agents Interacting Online

Jeff Bezos' AI Startup Secures $41 Billion Valuation in Latest Funding Round

CoreWeave Secures Global Funding Through Euro Junk-Bond Deal

Former xAI Employee Claims Termination for Raising Safety Concerns About Grok