WALDO Framework Enhances Zero-Shot Anomaly Localisation in Medical Imaging Using Vision-Language Models

arXiv AI· Bernhard Kainz, Johanna P Mueller, Matthew Baugh et al.· Friday, May 8, 2026

Research presents WALDO, a training-free framework for zero-shot anomaly localisation in medical imaging, leveraging vision-language models (VLMs) to improve rare pathology detection. The framework reformulates anomaly detection as a comparative inference problem, utilizing entropy-weighted Sliced Wasserstein distances for anatomically-aware reference selection, Goldilocks zone sampling for optimal reference similarity, and self-consistency aggregation through weighted non-maximum suppression. Theoretical analysis indicates that moderate similarity references minimize bias-variance trade-offs in visual reasoning. Evaluated on the NOVA brain MRI benchmark, WALDO with Qwen2.5-VL-72B achieved 43.5% mAP@30, a 19% relative improvement over zero-shot baselines, with statistical significance confirmed by paired McNemar tests ($p<0.01$). Source code is available on GitHub.

Read Full Article

View All For This Day

WALDO Framework Enhances Zero-Shot Anomaly Localisation in Medical Imaging Using Vision-Language Models

More Articles From This Day

IMF Issues Warning on Potential Systemic Risks of New AI Models to Financial Sector

Periodic Labs Seeks $500 Million Funding at $7.5 Billion Valuation for AI Scientific Discovery

AMD Shares Surge on Strong AI-Driven Sales Forecast

Innovative Low-Cost Method for Detecting LLM Hallucinations Using Dynamical System Theory

OpenAI Launches Trusted Contact Safety Feature in ChatGPT

SWE Atlas Launches Comprehensive Benchmark Suite for Evaluating Coding Agents