The Debugging Decay Index: Rethinking Debugging Strategies for Code LLMs Paper • 2506.18403 • Published Jun 23, 2025 • 3
ReCode: Updating Code API Knowledge with Reinforcement Learning Paper • 2506.20495 • Published Jun 25, 2025 • 10
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution Paper • 2507.23348 • Published Jul 31, 2025 • 12
LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering Paper • 2509.09614 • Published Sep 11, 2025 • 7
LongCodeZip: Compress Long Context for Code Language Models Paper • 2510.00446 • Published Oct 1, 2025 • 107
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution Paper • 2510.08697 • Published Oct 9, 2025 • 39
ReCode: Unify Plan and Action for Universal Granularity Control Paper • 2510.23564 • Published Oct 27, 2025 • 122
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation Paper • 2511.02778 • Published Nov 4, 2025 • 102
WebVIA: A Web-based Vision-Language Agentic Framework for Interactive and Verifiable UI-to-Code Generation Paper • 2511.06251 • Published Nov 9, 2025 • 14
MemGovern: Enhancing Code Agents through Learning from Governed Human Experiences Paper • 2601.06789 • Published Jan 11 • 79
ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development Paper • 2601.11077 • Published Jan 16 • 65
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model Paper • 2601.15892 • Published Jan 22 • 53
Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces Paper • 2601.11868 • Published Jan 17 • 34
CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding Paper • 2602.01785 • Published 30 days ago • 94
SWE-Universe: Scale Real-World Verifiable Environments to Millions Paper • 2602.02361 • Published 29 days ago • 60
Code2World: A GUI World Model via Renderable Code Generation Paper • 2602.09856 • Published 21 days ago • 196
K-Search: LLM Kernel Generation via Co-Evolving Intrinsic World Model Paper • 2602.19128 • Published 10 days ago • 6
SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale Paper • 2602.23866 • Published 5 days ago • 45