SANA-WM, a 2.6B-parameter open-source world model, has been developed to synthesize high-fidelity, 720p minute-scale videos with precise camera control. This model achieves visual quality on par with established industrial benchmarks while improving efficiency. Key architectural innovations include Hybrid Linear Attention for memory-efficient long-context modeling, Dual-Branch Camera Control for accurate trajectory adherence, a Two-Stage Generation Pipeline for enhanced output quality, and a Robust Annotation Pipeline for extracting high-quality action labels. SANA-WM utilizes approximately 213K public video clips for training, completes training in 15 days on 64 H100s, and efficiently generates video clips on a single GPU. In benchmark tests, it outperforms previous open-source models in action-following accuracy while achieving a significant increase in throughput.
Introducing SANA-WM: A Breakthrough in Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer
More Articles From This Day
Anthropic Secures $30 Billion Funding Round at $900 Billion Valuation
Anthropic, an AI lab, has finalized terms for a significant $30 billion funding deal, achieving a valuation of $900 billion. The funding round is being led by notable investors including Dragoneer, Greenoaks, Sequoia Capital, and Altimeter Capital. This investment marks a substantial milestone for the company as it continues to expand its influence in the AI sector.
