RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation Paper • 2509.16198 • Published Sep 19, 2025 • 126
SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks? Paper • 2509.16941 • Published Sep 21, 2025 • 21
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models Paper • 2303.08896 • Published Mar 15, 2023 • 4
From Local to Global: A Graph RAG Approach to Query-Focused Summarization Paper • 2404.16130 • Published Apr 24, 2024 • 7
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models Paper • 2405.14831 • Published May 23, 2024 • 5
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use Paper • 2509.24002 • Published Sep 28, 2025 • 174
Imperceptible Jailbreaking against Large Language Models Paper • 2510.05025 • Published Oct 6, 2025 • 33
Less is More: Recursive Reasoning with Tiny Networks Paper • 2510.04871 • Published Oct 6, 2025 • 501