PromptRL: Language Models as Co-Learners in Flow-Based Image Generation RL 🚀
We found two critical failure modes in flow-based RL: 1️⃣ Quality-Diversity Dilemma: High-quality models produce similar outputs, bottlenecking RL exploration 2️⃣ Prompt Linguistic Hacking: Models overfit to surface patterns—paraphrase the prompt and performance tanks
Solution: **Jointly train LM + FM** — the LM dynamically generates semantically-consistent but diverse prompt variants