Add pipeline tag and model description (#1)
Browse files- Add pipeline tag and model description (be9a7315749c44abdca3e287f1523de23b5cfa0f)
Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>
README.md
CHANGED
|
@@ -1,18 +1,30 @@
|
|
| 1 |
---
|
| 2 |
-
library_name: transformers
|
| 3 |
-
license: apache-2.0
|
| 4 |
-
datasets:
|
| 5 |
-
- Franklin0/ReasonGen-R1-SFT-230k
|
| 6 |
base_model:
|
| 7 |
- deepseek-ai/Janus-Pro-7B
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
---
|
| 9 |
|
| 10 |
-
# Model Card for
|
| 11 |
|
| 12 |
-
SFT Only model
|
| 13 |
|
| 14 |
Website: https://aka.ms/reasongen
|
| 15 |
|
| 16 |
Code: https://github.com/Franklin-Zhang0/Image-RL
|
| 17 |
|
| 18 |
-
Arxiv: https://arxiv.org/abs/2505.24875
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
base_model:
|
| 3 |
- deepseek-ai/Janus-Pro-7B
|
| 4 |
+
datasets:
|
| 5 |
+
- Franklin0/ReasonGen-R1-SFT-230k
|
| 6 |
+
library_name: transformers
|
| 7 |
+
license: apache-2.0
|
| 8 |
+
pipeline_tag: text-to-image
|
| 9 |
---
|
| 10 |
|
| 11 |
+
# Model Card for ReasonGen-R1 (SFT Only)
|
| 12 |
|
| 13 |
+
ReasonGen-R1 (SFT Only) is a text-to-image model fine-tuned using supervised fine-tuning (SFT) on a dataset of image prompts and rationales. It's based on the deepseek-ai/Janus-Pro-7B model and is described in the paper: "[ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL](https://huggingface.co/papers/2505.24875)".
|
| 14 |
|
| 15 |
Website: https://aka.ms/reasongen
|
| 16 |
|
| 17 |
Code: https://github.com/Franklin-Zhang0/Image-RL
|
| 18 |
|
| 19 |
+
Arxiv: https://arxiv.org/abs/2505.24875
|
| 20 |
+
|
| 21 |
+
## 1. Introduction
|
| 22 |
+
|
| 23 |
+
Although chain-of-thought (CoT) reasoning and reinforcement learning (RL) have driven breakthroughs in NLP, their integration into generative vision models remains underexplored. We introduce ReasonGen-R1, a two-stage framework that first imbues an autoregressive image generator with explicit text-based "thinking" skills via supervised fine-tuning (SFT) on a newly generated reasoning dataset of written rationales, and then refines its outputs using Group Relative Policy Optimization (GRPO).
|
| 24 |
+
To enable the model to reason through text before generating images, We automatically generate and release a corpus of model-crafted rationales paired with visual prompts, enabling controlled planning of object layouts, styles, and scene compositions.
|
| 25 |
+
Our GRPO algorithm uses reward signals from a pretrained vision–language model to assess overall visual quality, optimizing the policy in each update.
|
| 26 |
+
Evaluations on Geneval, DPG, and the T2I benchmark demonstrate that ReasonGen-R1 consistently outperforms strong baselines and prior state-of-the-art models. We will open-source our generated reasoning dataset and training code to accelerate further advances in text-based reasoning–driven image generation.
|
| 27 |
+
|
| 28 |
+
## 4. Acknowledgements
|
| 29 |
+
|
| 30 |
+
We would like to thank Verl, upon which our repo is built.
|