OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks Paper • 2604.08539 • Published 2 days ago • 33
Think in Strokes, Not Pixels: Process-Driven Image Generation via Interleaved Reasoning Paper • 2604.04746 • Published 3 days ago • 57
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Paper • 2603.19835 • Published 22 days ago • 330
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents Paper • 2603.24440 • Published 17 days ago • 95
MoKus: Leveraging Cross-Modal Knowledge Transfer for Knowledge-Aware Concept Customization Paper • 2603.12743 • Published 29 days ago • 3
UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding Paper • 2307.00862 • Published Jul 3, 2023 • 1