Semi-Supervised Reward Modeling via Iterative Self-Training Paper • 2409.06903 • Published Sep 10, 2024
Towards Understanding the Fragility of Multilingual LLMs against Fine-Tuning Attacks Paper • 2410.18210 • Published Oct 23, 2024
MergeBench: A Benchmark for Merging Domain-Specialized LLMs Paper • 2505.10833 • Published May 16, 2025 • 1
Scalable Data Synthesis for Computer Use Agents with Step-Level Filtering Paper • 2512.10962 • Published Nov 22, 2025 • 3
Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic Paper • 2408.13656 • Published Aug 24, 2024