Text-to-Image
Transformers
Safetensors
multi_modality
Franklin0 nielsr HF Staff commited on
Commit
4ddd8c6
·
verified ·
1 Parent(s): 5787426

Add pipeline tag and model description (#1)

Browse files

- Add pipeline tag and model description (be9a7315749c44abdca3e287f1523de23b5cfa0f)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +19 -7
README.md CHANGED
@@ -1,18 +1,30 @@
1
  ---
2
- library_name: transformers
3
- license: apache-2.0
4
- datasets:
5
- - Franklin0/ReasonGen-R1-SFT-230k
6
  base_model:
7
  - deepseek-ai/Janus-Pro-7B
 
 
 
 
 
8
  ---
9
 
10
- # Model Card for Model ID
11
 
12
- SFT Only model for the paper: "[ReasonGen-R1: Cot for Autoregressive Image generation models through SFT and RL](https://huggingface.co/papers/2505.24875)".
13
 
14
  Website: https://aka.ms/reasongen
15
 
16
  Code: https://github.com/Franklin-Zhang0/Image-RL
17
 
18
- Arxiv: https://arxiv.org/abs/2505.24875
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
 
2
  base_model:
3
  - deepseek-ai/Janus-Pro-7B
4
+ datasets:
5
+ - Franklin0/ReasonGen-R1-SFT-230k
6
+ library_name: transformers
7
+ license: apache-2.0
8
+ pipeline_tag: text-to-image
9
  ---
10
 
11
+ # Model Card for ReasonGen-R1 (SFT Only)
12
 
13
+ ReasonGen-R1 (SFT Only) is a text-to-image model fine-tuned using supervised fine-tuning (SFT) on a dataset of image prompts and rationales. It's based on the deepseek-ai/Janus-Pro-7B model and is described in the paper: "[ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL](https://huggingface.co/papers/2505.24875)".
14
 
15
  Website: https://aka.ms/reasongen
16
 
17
  Code: https://github.com/Franklin-Zhang0/Image-RL
18
 
19
+ Arxiv: https://arxiv.org/abs/2505.24875
20
+
21
+ ## 1. Introduction
22
+
23
+ Although chain-of-thought (CoT) reasoning and reinforcement learning (RL) have driven breakthroughs in NLP, their integration into generative vision models remains underexplored. We introduce ReasonGen-R1, a two-stage framework that first imbues an autoregressive image generator with explicit text-based "thinking" skills via supervised fine-tuning (SFT) on a newly generated reasoning dataset of written rationales, and then refines its outputs using Group Relative Policy Optimization (GRPO).
24
+ To enable the model to reason through text before generating images, We automatically generate and release a corpus of model-crafted rationales paired with visual prompts, enabling controlled planning of object layouts, styles, and scene compositions.
25
+ Our GRPO algorithm uses reward signals from a pretrained vision–language model to assess overall visual quality, optimizing the policy in each update.
26
+ Evaluations on Geneval, DPG, and the T2I benchmark demonstrate that ReasonGen-R1 consistently outperforms strong baselines and prior state-of-the-art models. We will open-source our generated reasoning dataset and training code to accelerate further advances in text-based reasoning–driven image generation.
27
+
28
+ ## 4. Acknowledgements
29
+
30
+ We would like to thank Verl, upon which our repo is built.