VLMs Need Words: Vision Language Models Ignore Visual Detail In Favor of Semantic Anchors Paper • 2604.02486 • Published 8 days ago • 7
CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence Paper • 2603.28032 • Published 11 days ago • 339
When Models Judge Themselves: Unsupervised Self-Evolution for Multimodal Reasoning Paper • 2603.21289 • Published 19 days ago • 34