Running Agents 1.51k Big Code Models Leaderboard 📈 1.51k Explore and compare code model performance on a leaderboard
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming Paper • 2402.14261 • Published Feb 22, 2024 • 10