Learn2Fold: Structured Origami Generation with World Model Planning Paper • 2603.29585 • Published Feb 2 • 12
GaussianGPT: Towards Autoregressive 3D Gaussian Scene Generation Paper • 2603.26661 • Published 7 days ago • 16
MMFace-DiT: A Dual-Stream Diffusion Transformer for High-Fidelity Multimodal Face Generation Paper • 2603.29029 • Published 4 days ago • 12
EgoSim: Egocentric World Simulator for Embodied Interaction Generation Paper • 2604.01001 • Published 3 days ago • 30
Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis Paper • 2603.29620 • Published 3 days ago • 41
Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification Paper • 2603.26648 • Published 7 days ago • 36
ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners? Paper • 2603.25823 • Published 8 days ago • 39
CutClaw: Agentic Hours-Long Video Editing via Music Synchronization Paper • 2603.29664 • Published 3 days ago • 42
VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward Paper • 2603.26599 • Published 7 days ago • 52
MiroEval: Benchmarking Multimodal Deep Research Agents in Process and Outcome Paper • 2603.28407 • Published 4 days ago • 59
SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization Paper • 2604.02268 • Published 2 days ago • 71
Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development Paper • 2603.27460 • Published 6 days ago • 58
Lingshu-Cell: A generative cellular world model for transcriptome modeling toward virtual cells Paper • 2603.25240 • Published 8 days ago • 74
The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook Paper • 2604.02029 • Published 2 days ago • 95