Spaces:
Running
on
Zero
Running
on
Zero
| # Multi-Agent Neural Network Diagram Generator (Skeleton) β Gemini 2.5 Flash Image | |
| This repository is a minimal, runnable skeleton that turns a textual NN spec into a publication-style diagram via a multi-agent pipeline: | |
| - Parser β Planner β Prompt-Generator β Image-Generator (G1) β Label-Generator (G2) β Judge β Selector β (Editor loop) β Archivist | |
| - All model calls flow through `call_gemini(...)`, making it easy to use Gemini 2.5 Flash for text and Gemini 2.5 Flash Image for images. | |
| Key additions in this version | |
| - Two-stage generation: G1 draws the geometry-only skeleton (no text), G2 overlays labels on top of the skeleton. | |
| - Hard violations: Judge returns actionable violations; missing labels are flagged as HARD to trigger edits reliably. | |
| - Parallelism: G1, G2, and Judge run in parallel; set `NNG_CONCURRENCY` (default 4). | |
| - Remote images by default: image generate/edit use Gemini 2.5 Flash Image models. If API is missing, the system can fall back to a local placeholder to stay runnable. | |
| ## Quick Start | |
| 1) Python 3.10+ | |
| 2) Install deps | |
| ``` | |
| pip install -r requirements.txt | |
| ``` | |
| 3) Configure Gemini (choose one) | |
| - Env var: `export GEMINI_API_KEY=YOUR_KEY` | |
| - File: create `app/llm/credentials.py` with `GEMINI_API_KEY = "YOUR_KEY"` | |
| 4) Run (K=candidates, T=max edit rounds) | |
| ``` | |
| # Text mode (spec -> image) | |
| python -m app.cli --mode text --spec spec/vit.txt --K 4 --T 1 | |
| # Image mode (text + image fusion/edit) | |
| # Example: edit an existing diagram with a component replacement using a reference image | |
| python -m app.cli --mode image --base-image path/to/base.png \ | |
| --ref-image path/to/transformer_ref.png \ | |
| --instructions "Replace the UNet backbone with a Transformer (DiT); keep layout, font, and colors consistent." | |
| ``` | |
| Artifacts are saved under `artifacts/run_YYYYmmdd_HHMMSS/` with `final.png` as the chosen result. | |
| ## Gemini 2.5 Flash Image in This Project | |
| - G1 geometry: `gen_generate.py` calls `GEMINI_IMAGE_MODEL` (Gemini 2.5 Flash Image) to render a clean, geometry-only skeleton quickly. | |
| - G2 labels: `gen_labels.py` uses `GEMINI_IMAGE_EDIT_MODEL` to overlay text labels onto the G1 skeleton without redrawing everything. | |
| - Edit loop: `edit.py` performs targeted corrections via the same image model, enabling fast, iterative refinements instead of full regenerations. | |
| - Why it matters: the modelβs speed and editability make multi-round diagram refinement practical while preserving layout quality. | |
| - Fallback: if no API key is available, the pipeline remains runnable using local placeholders generated by `app/llm/gemini.py`. | |
| ## Models | |
| - `GEMINI_MODEL` (default `gemini-2.5-flash`): parsing, planning, prompt generation, and judging. | |
| - `GEMINI_IMAGE_MODEL` (recommended `gemini-2.5-flash-image` or `gemini-2.5-flash-image-preview`): image generation (G1). | |
| - `GEMINI_IMAGE_EDIT_MODEL` (recommended `gemini-2.5-flash-image` or `gemini-2.5-flash-image-preview`): image editing (G2, Editor). | |
| Notes: If `GEMINI_API_KEY` is not set, the pipeline uses offline placeholders to remain runnable. With an API key present, you must set valid image model env vars; errors are raised if image models are unset or calls fail (no automatic local fallback). | |
| ## Fusion Mode (Text + Image) | |
| - Accepts a base diagram (`--base-image`) and optional reference images (`--ref-image` repeatable) plus instructions. | |
| - Uses Gemini 2.5 Flash Image to compose images under textual guidance β ideal for swapping a module (e.g., UNet β Transformer) while preserving style and layout. | |
| - Outputs multiple fused candidates (`K`) and archives the first as `final.png`. | |
| ## Structure | |
| ``` | |
| app/ | |
| cli.py # CLI entry (K/T/outdir) | |
| graph.py # Orchestrator + edit loop | |
| state.py # AppState + artifacts | |
| prompts.py # Centralized prompts (parse/plan/G1/G2/judge/edit) | |
| nodes/ | |
| parser.py, planner.py, prompt_gen.py | |
| gen_generate.py # G1 skeleton images (no text) | |
| gen_labels.py # G2 label overlay edits | |
| judge.py, select.py, edit.py, archive.py | |
| llm/ | |
| gemini.py # Unified wrapper (API + offline fallback) | |
| credentials.example.py | |
| spec/ | |
| vit.txt # Example ViT spec (English) | |
| artifacts/ # Outputs per run | |
| ``` | |
| ## Tips | |
| - Concurrency: `NNG_CONCURRENCY=4 python -m app.cli --spec ...` | |
| - Tuning: Start with `K=4, T=1`; increase `T` for more correction rounds. | |
| - Debug: image calls write `*.resp.txt`/`*.meta.json` alongside outputs (can be removed later if undesired). | |