FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning Paper • 2510.22543 • Published Oct 26 • 11
SCAN: Self-Denoising Monte Carlo Annotation for Robust Process Reward Learning Paper • 2509.16548 • Published Sep 20
Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch Paper • 2410.18693 • Published Oct 24, 2024 • 42
Rethinking Negative Instances for Generative Named Entity Recognition Paper • 2402.16602 • Published Feb 26, 2024 • 3
OpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model Pre-trained from Scratch Paper • 2309.10706 • Published Sep 19, 2023 • 16