|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
- zh |
|
|
base_model: |
|
|
- Tongyi-MAI/Z-Image-Turbo |
|
|
pipeline_tag: text-to-image |
|
|
tags: |
|
|
- text-to-image |
|
|
- image-generation |
|
|
- diffusion |
|
|
- comfyui |
|
|
- photorealistic |
|
|
- bilingual |
|
|
- chinese |
|
|
- english |
|
|
- 8-step |
|
|
- fast-generation |
|
|
--- |
|
|
|
|
|
# 🚀 Z-Image-Turbo-AIO | 8-Step Photorealistic Generation |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
**Ultra-Fast • Bilingual Text Rendering • All-in-One • FP8 & BF16** |
|
|
|
|
|
[](https://opensource.org/licenses/Apache-2.0) |
|
|
[](https://github.com/comfyanonymous/ComfyUI) |
|
|
|
|
|
</div> |
|
|
|
|
|
## ✨ What is Z-Image-Turbo-AIO? |
|
|
|
|
|
Z-Image-Turbo-AIO is an **All-in-One repackage** of Alibaba Tongyi Lab's 6B parameter photorealistic image generator, optimized for lightning-fast 8-step generation. This version includes **integrated VAE and Text Encoder** for maximum convenience - just download and generate! |
|
|
|
|
|
### Available Versions |
|
|
|
|
|
| Version | Size | Best For | |
|
|
|---------|------|----------| |
|
|
| 🟡 **FP8-AIO** | ~10GB | For most users | |
|
|
| 🌟 **BF16-AIO** | ~20GB | Maximum quality | |
|
|
|
|
|
## 🎯 Key Features |
|
|
|
|
|
- ⚡ **8-step generation** - 10-40 seconds per image, depends on your GPU |
|
|
- 📦 **All-in-One** - No separate VAE/Text Encoder downloads needed |
|
|
- 📸 **Photorealistic** - Professional quality output |
|
|
- 📖 **Bilingual** - English & Chinese text rendering |
|
|
- 🎯 **8GB VRAM** - Works on GPUs with 8GB VRAM |
|
|
- 🌐 **Apache 2.0** - Open license for any use |
|
|
|
|
|
## 🔄 Which Version Should I Choose? |
|
|
|
|
|
### 🟡 FP8-AIO (Recommended for most users) |
|
|
- ✅ Half the file size |
|
|
- ✅ Faster download |
|
|
- ✅ Excellent quality |
|
|
- ✅ Perfect for 8GB VRAM |
|
|
- ✅ Great for testing & everyday use |
|
|
|
|
|
### 🌟 BF16-AIO (Maximum precision) |
|
|
- ✅ BFloat16 full precision |
|
|
- ✅ Lossless quality |
|
|
- ✅ Great for testing & everyday use |
|
|
- ✅ Still works on 8GB VRAM |
|
|
|
|
|
### 🖼️ [CivitAI Page](https://civit.ai/models/2173571) |
|
|
|
|
|
## 📥 Quick Start (ComfyUI) |
|
|
|
|
|
### Installation |
|
|
|
|
|
1. Download your preferred version (FP8 or BF16) |
|
|
2. Place in `ComfyUI/models/checkpoints` |
|
|
3. Load with "Load Checkpoint" node |
|
|
4. Generate! |
|
|
|
|
|
--- |
|
|
|
|
|
## 🔧 ComfyUI Workflow |
|
|
|
|
|
 |
|
|
|
|
|
Download: [z-image-turbo-aio-workflow.json](z-image-turbo-aio-workflow.json) |
|
|
|
|
|
--- |
|
|
|
|
|
### 🚀 Standard Workflow v2 (Recommended) |
|
|
Text-to-image generation with improved upscaler and metadata saving. |
|
|
|
|
|
 |
|
|
|
|
|
📥 **Download:** [Z-Image-Turbo-AIO-workflow-v2.json](workflow/Z-Image-Turbo-AIO-workflow-v2.json) |
|
|
|
|
|
**Features:** |
|
|
- Pre-configured settings (9 steps, CFG 1.0) |
|
|
- Dual samplers: `res_multistep` (sharp) or `euler_ancestral` (natural) |
|
|
- Improved upscaler with denoise control |
|
|
- Automatic metadata saving for easy CivitAI uploads |
|
|
|
|
|
**Required Nodes:** `rgthree-comfy`, `comfyui_image_metadata_extension` |
|
|
|
|
|
--- |
|
|
|
|
|
### 🎮 ControlNet Workflow |
|
|
Precise control with reference images using ControlNet Union. |
|
|
|
|
|
 |
|
|
|
|
|
📥 **Download:** [Z-Image-Turbo-AIO-workflow-controlnet.json](workflow/Z-Image-Turbo-AIO-workflow-controlnet.json) |
|
|
|
|
|
**Features:** |
|
|
- 5 control types: Canny, Depth, Pose, HED, MLSD |
|
|
- Megapixel scaling (auto aspect ratio) |
|
|
- ControlNet strength adjustment |
|
|
|
|
|
**Required Nodes:** `rgthree-comfy`, `comfyui_image_metadata_extension`, `comfyui_controlnet_aux` |
|
|
|
|
|
**Additional Download Required:** |
|
|
- [Z-Image-Turbo-Fun-Controlnet-Union.safetensors](https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/split_files/model_patches/z_image_turbo_fun_controlnet_union.safetensors) (~2.5GB) |
|
|
- ⚠️ Save in: `ComfyUI/models/model_patches/` (NOT controlnet/) |
|
|
- ⚠️ Requires ComfyUI 3.77+ |
|
|
|
|
|
--- |
|
|
|
|
|
### ⚙️ Recommended Settings (Both Workflows) |
|
|
|
|
|
| Parameter | Value | |
|
|
|-----------|-------| |
|
|
| Steps | 9 | |
|
|
| CFG | 1.0 | |
|
|
| Sampler | `res_multistep` (sharp) or `euler_ancestral` (natural) | |
|
|
| Scheduler | `simple` (clean) or `beta` (smooth) | |
|
|
| Resolution | 1920×1088 | |
|
|
|
|
|
💡 **Tip:** NO negative prompts needed - the model ignores them! |
|
|
|
|
|
|
|
|
## 📊 Performance |
|
|
|
|
|
All tests on **RTX 4060 (8GB VRAM)** • FP8 • 1920×1088 • 8 steps |
|
|
|
|
|
| Test | Generation Time | |
|
|
|------|-----------------| |
|
|
| Urban Interior | ~32s | |
|
|
| Architecture | ~32-34s | |
|
|
| Food Photography | ~32s | |
|
|
| Bilingual Signage | ~32s | |
|
|
|
|
|
--- |
|
|
|
|
|
## 📊 Test Results |
|
|
|
|
|
All tests on **RTX 4060 (8GB VRAM)** • FP8 • 1920×1088 • 8 steps • CFG 1.0 • res_multistep + simple |
|
|
|
|
|
### 🔬 Test 1: Urban Coffee Shop Interior |
|
|
|
|
|
 |
|
|
|
|
|
**Prompt:** |
|
|
``` |
|
|
Modern coffee shop interior with industrial design. Exposed brick walls, |
|
|
wooden beams on ceiling, pendant lights hanging above bar. Professional |
|
|
espresso machine on marble counter, barista preparing latte art. Customers |
|
|
sitting at wooden tables with laptops. Large windows showing city street |
|
|
outside. Warm afternoon lighting, cozy atmosphere. Photorealistic style, |
|
|
professional architectural photography, 8K detail. |
|
|
``` |
|
|
|
|
|
⏱️ **Time:** 31.98s | 🎯 **Use Case:** Architectural interior photography, commercial spaces |
|
|
|
|
|
--- |
|
|
|
|
|
### 🔬 Test 2: Traditional Chinese Architecture |
|
|
|
|
|
 |
|
|
|
|
|
**Prompt:** |
|
|
``` |
|
|
Beautiful traditional Chinese temple courtyard during golden hour. Red |
|
|
wooden pillars with intricate gold carvings, curved tile roofs with |
|
|
upturned eaves. Stone lion statues flanking entrance. Cherry blossoms |
|
|
in full bloom around courtyard. Red lanterns hanging from eaves. Soft |
|
|
sunset light casting warm glow. Ancient architecture, peaceful atmosphere. |
|
|
Professional travel photography, ultra-sharp detail, cinematic composition. |
|
|
``` |
|
|
|
|
|
⏱️ **Time:** 33.59s | 🎯 **Use Case:** Travel photography, cultural heritage documentation |
|
|
|
|
|
--- |
|
|
|
|
|
### 🔬 Test 3: Gourmet Food Photography |
|
|
|
|
|
 |
|
|
|
|
|
**Prompt:** |
|
|
``` |
|
|
Professional food photography of gourmet sushi platter on black slate plate. |
|
|
Assorted nigiri and maki rolls with fresh salmon, tuna, and avocado. |
|
|
Garnished with pickled ginger, wasabi, and microgreens. Chopsticks placed |
|
|
beside plate. Rustic wooden table surface. Soft natural window light from |
|
|
side creating subtle shadows. Shallow depth of field, appetizing presentation. |
|
|
Restaurant-quality styling, commercial food photography, magazine-worthy. |
|
|
``` |
|
|
|
|
|
⏱️ **Time:** 32.16s | 🎯 **Use Case:** Food photography, restaurant menus, commercial advertising |
|
|
|
|
|
|
|
|
|
|
|
## 💡 Prompting Guide |
|
|
|
|
|
### ✅ Natural Language Works Best! |
|
|
|
|
|
**Good Example:** |
|
|
``` |
|
|
A cozy bookstore with floor-to-ceiling wooden shelves filled with |
|
|
colorful books, comfortable reading nooks with cushions near large |
|
|
windows, warm pendant lighting, peaceful afternoon atmosphere, |
|
|
professional interior photography |
|
|
``` |
|
|
|
|
|
**Bad Example:** |
|
|
``` |
|
|
bookstore, books, chairs, window, cozy, warm light, interior |
|
|
``` |
|
|
|
|
|
### 📖 Bilingual Text Rendering |
|
|
|
|
|
**English Text:** |
|
|
``` |
|
|
Neon sign reading "OPEN 24/7" in bright blue letters above entrance. |
|
|
Modern sans-serif font, glowing effect against brick wall. |
|
|
``` |
|
|
|
|
|
**Chinese Text:** |
|
|
``` |
|
|
Traditional tea house entrance with sign reading "古韵茶坊" in elegant |
|
|
gold Chinese calligraphy on red wooden board with ornate carved border. |
|
|
``` |
|
|
|
|
|
**Both Languages:** |
|
|
``` |
|
|
Modern cafe exterior with bilingual sign. "Morning Brew Coffee" in |
|
|
white elegant script above, "晨曦咖啡" in matching Chinese characters |
|
|
below. Both glowing warmly at dusk. |
|
|
``` |
|
|
|
|
|
### 📝 Prompting Tips |
|
|
|
|
|
| Do ✅ | Don't ❌ | |
|
|
|------|---------| |
|
|
| Use natural language descriptions | Use tag-style prompts (tag1, tag2) | |
|
|
| Be detailed (100-300 words optimal) | Write very short prompts (<50 words) | |
|
|
| Include lighting and mood | Add negative prompts (not used) | |
|
|
| Describe camera angle and style | Include conflicting instructions | |
|
|
| Specify materials and colors | | |
|
|
|
|
|
## 🙏 Credits & Acknowledgments |
|
|
|
|
|
### Original Model |
|
|
- **Developer:** Tongyi Lab (Alibaba Group) |
|
|
- **Architecture:** Single-Stream Diffusion Transformer (6B parameters) |
|
|
- **Algorithm:** Decoupled-DMD + DMDR |
|
|
- **License:** Apache 2.0 |
|
|
|
|
|
### AIO Conversion |
|
|
- **Created by:** [SeeSee21](https://huggingface.co/SeeSee21) |
|
|
- **Format:** Integrated VAE + Text Encoder |
|
|
- **Purpose:** Simplified single-file deployment |
|
|
|
|
|
### Resources |
|
|
- 🤗 [Original HuggingFace](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) |
|
|
- 💻 [GitHub Repository](https://github.com/Tongyi-MAI/Z-Image) |
|
|
- 🎨 [ComfyUI Files](https://huggingface.co/Comfy-Org/z_image_turbo) |
|
|
- 🖼️ [CivitAI Page](https://civit.ai/models/2173571) |
|
|
|
|
|
## 📈 Version History |
|
|
|
|
|
### v1.0 - Initial AIO Release |
|
|
- FP8-AIO version (10GB) |
|
|
- BF16-AIO version (20GB) |
|
|
- Integrated VAE + Text Encoder |
|
|
- Single-file deployment |
|
|
- Based on Tongyi-MAI/Z-Image-Turbo |
|
|
- Tested on RTX 4060 8GB |
|
|
- Optimized for 1920×1088 |
|
|
|
|
|
--- |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
**Download, load with "Load Checkpoint", and generate professional photos in seconds! 🚀** |
|
|
|
|
|
</div> |