Add pipeline tag and model description (#1)

Browse files

- Add pipeline tag and model description (be9a7315749c44abdca3e287f1523de23b5cfa0f)

Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show

README.md +19 -7

README.md CHANGED Viewed

@@ -1,18 +1,30 @@
 ---
-library_name: transformers
-license: apache-2.0
-datasets:
-- Franklin0/ReasonGen-R1-SFT-230k
 base_model:
 - deepseek-ai/Janus-Pro-7B
 ---
-# Model Card for Model ID
-SFT Only model for the paper: "[ReasonGen-R1: Cot for Autoregressive Image generation models through SFT and RL](https://huggingface.co/papers/2505.24875)".
 Website: https://aka.ms/reasongen
 Code: https://github.com/Franklin-Zhang0/Image-RL
-Arxiv: https://arxiv.org/abs/2505.24875

 ---
 base_model:
 - deepseek-ai/Janus-Pro-7B
+datasets:
+- Franklin0/ReasonGen-R1-SFT-230k
+library_name: transformers
+license: apache-2.0
+pipeline_tag: text-to-image
 ---
+# Model Card for ReasonGen-R1 (SFT Only)
+ReasonGen-R1 (SFT Only) is a text-to-image model fine-tuned using supervised fine-tuning (SFT) on a dataset of image prompts and rationales. It's based on the deepseek-ai/Janus-Pro-7B model and is described in the paper: "[ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL](https://huggingface.co/papers/2505.24875)".
 Website: https://aka.ms/reasongen
 Code: https://github.com/Franklin-Zhang0/Image-RL
+Arxiv: https://arxiv.org/abs/2505.24875
+## 1. Introduction
+Although chain-of-thought (CoT) reasoning and reinforcement learning (RL) have driven breakthroughs in NLP, their integration into generative vision models remains underexplored. We introduce ReasonGen-R1, a two-stage framework that first imbues an autoregressive image generator with explicit text-based "thinking" skills via supervised fine-tuning (SFT) on a newly generated reasoning dataset of written rationales, and then refines its outputs using Group Relative Policy Optimization (GRPO).
+To enable the model to reason through text before generating images, We automatically generate and release a corpus of model-crafted rationales paired with visual prompts, enabling controlled planning of object layouts, styles, and scene compositions.
+Our GRPO algorithm uses reward signals from a pretrained vision–language model to assess overall visual quality, optimizing the policy in each update.
+Evaluations on Geneval, DPG, and the T2I benchmark demonstrate that ReasonGen-R1 consistently outperforms strong baselines and prior state-of-the-art models. We will open-source our generated reasoning dataset and training code to accelerate further advances in text-based reasoning–driven image generation.
+## 4. Acknowledgements
+We would like to thank Verl, upon which our repo is built.