Submitted by Haoran Zhang 94 π-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows Simplified Reasoning 36 3