Spaces:

yingzhac
/

Neuro_sketch_by_Gemini

Running on Zero

App Files Files Community

Neuro_sketch_by_Gemini / NNGen /README.md

3v324v23

feat: add NNGen project under NNGen/ and ignore local secrets

0bdbec3 3 months ago

preview code

raw

history blame contribute delete

4.48 kB

	# Multi-Agent Neural Network Diagram Generator (Skeleton) — Gemini 2.5 Flash Image

	This repository is a minimal, runnable skeleton that turns a textual NN spec into a publication-style diagram via a multi-agent pipeline:
	- Parser → Planner → Prompt-Generator → Image-Generator (G1) → Label-Generator (G2) → Judge → Selector → (Editor loop) → Archivist
	- All model calls flow through `call_gemini(...)`, making it easy to use Gemini 2.5 Flash for text and Gemini 2.5 Flash Image for images.

	Key additions in this version
	- Two-stage generation: G1 draws the geometry-only skeleton (no text), G2 overlays labels on top of the skeleton.
	- Hard violations: Judge returns actionable violations; missing labels are flagged as HARD to trigger edits reliably.
	- Parallelism: G1, G2, and Judge run in parallel; set `NNG_CONCURRENCY` (default 4).
	- Remote images by default: image generate/edit use Gemini 2.5 Flash Image models. If API is missing, the system can fall back to a local placeholder to stay runnable.

	## Quick Start

	1) Python 3.10+

	2) Install deps
	```
	pip install -r requirements.txt
	```

	3) Configure Gemini (choose one)
	- Env var: `export GEMINI_API_KEY=YOUR_KEY`
	- File: create `app/llm/credentials.py` with `GEMINI_API_KEY = "YOUR_KEY"`

	4) Run (K=candidates, T=max edit rounds)
	```
	# Text mode (spec -> image)
	python -m app.cli --mode text --spec spec/vit.txt --K 4 --T 1

	# Image mode (text + image fusion/edit)
	# Example: edit an existing diagram with a component replacement using a reference image
	python -m app.cli --mode image --base-image path/to/base.png \
	--ref-image path/to/transformer_ref.png \
	--instructions "Replace the UNet backbone with a Transformer (DiT); keep layout, font, and colors consistent."
	```
	Artifacts are saved under `artifacts/run_YYYYmmdd_HHMMSS/` with `final.png` as the chosen result.

	## Gemini 2.5 Flash Image in This Project
	- G1 geometry: `gen_generate.py` calls `GEMINI_IMAGE_MODEL` (Gemini 2.5 Flash Image) to render a clean, geometry-only skeleton quickly.
	- G2 labels: `gen_labels.py` uses `GEMINI_IMAGE_EDIT_MODEL` to overlay text labels onto the G1 skeleton without redrawing everything.
	- Edit loop: `edit.py` performs targeted corrections via the same image model, enabling fast, iterative refinements instead of full regenerations.
	- Why it matters: the model’s speed and editability make multi-round diagram refinement practical while preserving layout quality.
	- Fallback: if no API key is available, the pipeline remains runnable using local placeholders generated by `app/llm/gemini.py`.

	## Models
	- `GEMINI_MODEL` (default `gemini-2.5-flash`): parsing, planning, prompt generation, and judging.
	- `GEMINI_IMAGE_MODEL` (recommended `gemini-2.5-flash-image` or `gemini-2.5-flash-image-preview`): image generation (G1).
	- `GEMINI_IMAGE_EDIT_MODEL` (recommended `gemini-2.5-flash-image` or `gemini-2.5-flash-image-preview`): image editing (G2, Editor).
	Notes: If `GEMINI_API_KEY` is not set, the pipeline uses offline placeholders to remain runnable. With an API key present, you must set valid image model env vars; errors are raised if image models are unset or calls fail (no automatic local fallback).

	## Fusion Mode (Text + Image)
	- Accepts a base diagram (`--base-image`) and optional reference images (`--ref-image` repeatable) plus instructions.
	- Uses Gemini 2.5 Flash Image to compose images under textual guidance – ideal for swapping a module (e.g., UNet → Transformer) while preserving style and layout.
	- Outputs multiple fused candidates (`K`) and archives the first as `final.png`.

	## Structure
	```
	app/
	cli.py # CLI entry (K/T/outdir)
	graph.py # Orchestrator + edit loop
	state.py # AppState + artifacts
	prompts.py # Centralized prompts (parse/plan/G1/G2/judge/edit)
	nodes/
	parser.py, planner.py, prompt_gen.py
	gen_generate.py # G1 skeleton images (no text)
	gen_labels.py # G2 label overlay edits
	judge.py, select.py, edit.py, archive.py
	llm/
	gemini.py # Unified wrapper (API + offline fallback)
	credentials.example.py
	spec/
	vit.txt # Example ViT spec (English)
	artifacts/ # Outputs per run
	```

	## Tips
	- Concurrency: `NNG_CONCURRENCY=4 python -m app.cli --spec ...`
	- Tuning: Start with `K=4, T=1`; increase `T` for more correction rounds.
	- Debug: image calls write `.resp.txt`/`.meta.json` alongside outputs (can be removed later if undesired).