3 28 4

minghao

Liam-Liu

liam-liu-1b262631a

AI & ML interests

LLM, AD

Recent Activity

authored a paper 3 days ago

NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

upvoted a paper 2 months ago

Reasoning with Sampling: Your Base Model is Smarter Than You Think

upvoted a paper 2 months ago

ACADREASON: Exploring the Limits of Reasoning Models with Academic Research Problems

View all activity

Organizations

authored a paper 3 days ago

NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

Paper • 2512.12730 • Published 26 days ago • 43

authored 10 papers 3 months ago

OAgents: An Empirical Study of Building Effective Agents

Paper • 2506.15741 • Published Jun 17, 2025 • 35

IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video?

Paper • 2509.24709 • Published Sep 29, 2025 • 6

ACADREASON: Exploring the Limits of Reasoning Models with Academic Research Problems

Paper • 2510.11652 • Published Oct 13, 2025 • 29

Beyond Correctness: Evaluating Subjective Writing Preferences Across Cultures

Paper • 2510.14616 • Published Oct 16, 2025 • 12

COIG-Writer: A High-Quality Dataset for Chinese Creative Writing with Thought Processes

Paper • 2510.14763 • Published Oct 16, 2025 • 13

A$^2$FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning

Paper • 2510.12838 • Published Oct 13, 2025 • 24

authored a paper 4 months ago

Reverse-Engineered Reasoning for Open-Ended Generation

Paper • 2509.06160 • Published Sep 7, 2025 • 149

authored 2 papers 5 months ago

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Paper • 2508.13167 • Published Aug 6, 2025 • 129

VeriGUI: Verifiable Long-Chain GUI Dataset

Paper • 2508.04026 • Published Aug 6, 2025 • 161

authored a paper 6 months ago

CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization

Paper • 2507.06181 • Published Jul 8, 2025 • 44

authored 2 papers 7 months ago

Scaling Test-time Compute for LLM Agents

Paper • 2506.12928 • Published Jun 15, 2025 • 63

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Paper • 2505.13032 • Published May 19, 2025 • 3

authored a paper 8 months ago

FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models

Paper • 2505.02735 • Published May 5, 2025 • 33

authored 2 papers 9 months ago

Objaverse++: Curated 3D Object Dataset with Quality Annotations

Paper • 2504.07334 • Published Apr 9, 2025 • 1

COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values

Paper • 2504.05535 • Published Apr 7, 2025 • 44

minghao

AI & ML interests

Recent Activity

Organizations

Liam-Liu's activity