SDAR Collection The models without suffixes use the default block size = 4. • 21 items • Updated Jan 2 • 9
Video-Based Reward Modeling for Computer-Use Agents Paper • 2603.10178 • Published 24 days ago • 42
Heterogeneous Agent Collaborative Reinforcement Learning Paper • 2603.02604 • Published Mar 3 • 191
Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone Paper • 2512.22615 • Published Dec 27, 2025 • 50
LoPA: Scaling dLLM Inference via Lookahead Parallel Decoding Paper • 2512.16229 • Published Dec 18, 2025 • 16
Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies Paper • 2512.19673 • Published Dec 22, 2025 • 66
SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models Paper • 2510.09541 • Published Oct 10, 2025 • 17
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning Paper • 2510.25992 • Published Oct 29, 2025 • 48
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Paper • 2504.13837 • Published Apr 18, 2025 • 141
VisPlay: Self-Evolving Vision-Language Models from Images Paper • 2511.15661 • Published Nov 19, 2025 • 44
The Path Not Taken: RLVR Provably Learns Off the Principals Paper • 2511.08567 • Published Nov 11, 2025 • 35
Running on CPU Upgrade Featured 3.08k The Smol Training Playbook 📚 3.08k The secrets to building world-class LLMs
Kimi Linear: An Expressive, Efficient Attention Architecture Paper • 2510.26692 • Published Oct 30, 2025 • 133