CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training Paper ⢠2504.13161 ⢠Published Apr 17 ⢠93
NaVILA: Legged Robot Vision-Language-Action Model for Navigation Paper ⢠2412.04453 ⢠Published Dec 5, 2024
EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos Paper ⢠2507.12440 ⢠Published Jul 16
Test-Time Scaling Strategies for Generative Retrieval in Multimodal Conversational Recommendations Paper ⢠2508.18132 ⢠Published Aug 25
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs Paper ⢠2510.11696 ⢠Published Oct 13 ⢠176
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM Paper ⢠2510.15870 ⢠Published Oct 17 ⢠89
DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning Paper ⢠2510.15110 ⢠Published Oct 16 ⢠15
SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models Paper ⢠2406.01584 ⢠Published Jun 3, 2024
WorldModelBench: Judging Video Generation Models As World Models Paper ⢠2502.20694 ⢠Published Feb 28
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration Paper ⢠2511.21689 ⢠Published 19 days ago ⢠104
SANA-Video Collection š¬ SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer ⢠8 items ⢠Updated 6 days ago ⢠6
Efficient-Large-Model/SANA-Video_2B_480p_LongLive_diffusers Text-to-Video ⢠Updated 6 days ago ⢠2
Efficient-Large-Model/SANA-Video_2B_480p_LongLive_diffusers Text-to-Video ⢠Updated 6 days ago ⢠2
SANA-Video Collection š¬ SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer ⢠8 items ⢠Updated 6 days ago ⢠6