OmniAgent: Audio-Guided Active Perception Agent for Omnimodal Audio-Video Understanding Paper • 2512.23646 • Published about 18 hours ago • 6
Nested Browser-Use Learning for Agentic Information Seeking Paper • 2512.23647 • Published about 18 hours ago • 6
Video-BrowseComp: Benchmarking Agentic Video Research on Open Web Paper • 2512.23044 • Published 1 day ago • 9
Training AI Co-Scientists Using Rubric Rewards Paper • 2512.23707 • Published about 17 hours ago • 9
Yume-1.5: A Text-Controlled Interactive World Generation Model Paper • 2512.22096 • Published 4 days ago • 43
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation Paper • 2512.23576 • Published about 20 hours ago • 44
Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation Paper • 2512.23705 • Published about 17 hours ago • 27
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss Paper • 2512.23447 • Published about 23 hours ago • 58
SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents Paper • 2512.22322 • Published 4 days ago • 31
GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators Paper • 2512.19682 • Published 8 days ago • 15
UltraViCo: Breaking Extrapolation Limits in Video Diffusion Transformers Paper • 2511.20123 • Published Nov 25 • 17
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation Paper • 2511.09611 • Published Nov 12 • 68
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation Paper • 2511.09611 • Published Nov 12 • 68 • 3
Demystifying Reinforcement Learning in Agentic Reasoning Paper • 2510.11701 • Published Oct 13 • 31
Generative Universal Verifier as Multimodal Meta-Reasoner Paper • 2510.13804 • Published Oct 15 • 25