Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms Paper • 2603.28489 • Published 2 days ago • 25
SpatialBot: Precise Spatial Understanding with Vision Language Models Paper • 2406.13642 • Published Jun 19, 2024 • 2