# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview LongCat-Image is a text-to-image generation model built on diffusion transformers, deployed as a Hugging Face Space with a Gradio interface. The model is based on the Flux architecture and supports both text-to-image generation and image editing. ## Running the Application ```bash # Install dependencies pip install -r requirements.txt # Run the Gradio app locally python app.py ``` The app launches with MCP server enabled on the default Gradio port. ## Architecture ### Core Components **Transformer Model** (`longcat_image/models/longcat_image_dit.py`): - `LongCatImageTransformer2DModel`: DiT-based transformer using Flux architecture - Uses `FluxTransformerBlock` (19 layers) and `FluxSingleTransformerBlock` (38 layers) - Supports gradient checkpointing for memory efficiency - Position embeddings via `FluxPosEmbed` with RoPE **Pipelines** (`longcat_image/pipelines/`): - `LongCatImagePipeline`: Text-to-image generation with optional prompt rewriting - `LongCatImageEditPipeline`: Image editing with vision-language conditioning - Both pipelines inherit from `DiffusionPipeline` and support LoRA, CFG renorm, and VAE tiling/slicing **Text Encoding**: - Uses Qwen-based text encoder with chat template formatting - Prompt template wraps user input between `<|im_start|>` and `<|im_end|>` tokens - Maximum token length: 512 ### Key Configuration - VAE scale factor: 8 (with 2x2 patch packing, effective 16x) - Default sample size: 128 (1024px at 8x scale) - Latent channels: 16 - Image dimensions must be divisible by 32 ### Prompt Rewriting The pipeline includes built-in prompt engineering via `rewire_prompt()` that uses the text encoder to expand simple prompts into detailed descriptions. This can be disabled with `enable_prompt_rewrite=False`. External prompt polishing is also available via `utils/prompt_utils.py` using Hugging Face Inference API (requires `HF_TOKEN`). ## Model Loading ```python from longcat_image.models import LongCatImageTransformer2DModel from longcat_image.pipelines import LongCatImagePipeline MODEL_REPO = "meituan-longcat/LongCat-Image" transformer = LongCatImageTransformer2DModel.from_pretrained( MODEL_REPO, subfolder='transformer', torch_dtype=torch.bfloat16 ) pipe = LongCatImagePipeline.from_pretrained(MODEL_REPO, transformer=transformer) ``` ## Environment Variables - `HF_TOKEN`: Required for prompt polishing via external API