# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

LongCat-Image is a text-to-image generation model built on diffusion transformers, deployed as a Hugging Face Space with a Gradio interface. The model is based on the Flux architecture and supports both text-to-image generation and image editing.

## Running the Application

```bash
# Install dependencies
pip install -r requirements.txt

# Run the Gradio app locally
python app.py
```

The app launches with MCP server enabled on the default Gradio port.

## Architecture

### Core Components

**Transformer Model** (`longcat_image/models/longcat_image_dit.py`):
- `LongCatImageTransformer2DModel`: DiT-based transformer using Flux architecture
- Uses `FluxTransformerBlock` (19 layers) and `FluxSingleTransformerBlock` (38 layers)
- Supports gradient checkpointing for memory efficiency
- Position embeddings via `FluxPosEmbed` with RoPE

**Pipelines** (`longcat_image/pipelines/`):
- `LongCatImagePipeline`: Text-to-image generation with optional prompt rewriting
- `LongCatImageEditPipeline`: Image editing with vision-language conditioning
- Both pipelines inherit from `DiffusionPipeline` and support LoRA, CFG renorm, and VAE tiling/slicing

**Text Encoding**:
- Uses Qwen-based text encoder with chat template formatting
- Prompt template wraps user input between `<|im_start|>` and `<|im_end|>` tokens
- Maximum token length: 512

### Key Configuration

- VAE scale factor: 8 (with 2x2 patch packing, effective 16x)
- Default sample size: 128 (1024px at 8x scale)
- Latent channels: 16
- Image dimensions must be divisible by 32

### Prompt Rewriting

The pipeline includes built-in prompt engineering via `rewire_prompt()` that uses the text encoder to expand simple prompts into detailed descriptions. This can be disabled with `enable_prompt_rewrite=False`.

External prompt polishing is also available via `utils/prompt_utils.py` using Hugging Face Inference API (requires `HF_TOKEN`).

## Model Loading

```python
from longcat_image.models import LongCatImageTransformer2DModel
from longcat_image.pipelines import LongCatImagePipeline

MODEL_REPO = "meituan-longcat/LongCat-Image"

transformer = LongCatImageTransformer2DModel.from_pretrained(
    MODEL_REPO, subfolder='transformer', torch_dtype=torch.bfloat16
)
pipe = LongCatImagePipeline.from_pretrained(MODEL_REPO, transformer=transformer)
```

## Environment Variables

- `HF_TOKEN`: Required for prompt polishing via external API