VAE Lyra π΅ - Illustrious Edition
Multi-modal VAE trained with custom CLIP weights.
CLIP Encoders
Uses CLIP weights from AbstractPhil/clips:
- CLIP-L:
IllustriousV01_clip_l.safetensors - CLIP-G:
IllustriousV01_clip_g.safetensors
CLIP Skip: 2 (penultimate layer)
Model Details
- Fusion Strategy: adaptive_cantor
- Latent Dimension: 2048
- Training Steps: 12,125
- Best Loss: 0.0377
- Prompt Source: booru
Quick Load (Safetensors)
from safetensors.torch import load_file
# Load just the weights (fast)
state_dict = load_file("weights/lyra_illustrious_best.safetensors")
# Or specific step
state_dict = load_file("weights/lyra_illustrious_step_5000.safetensors")
T5 Input Format
T5 receives a different input than CLIP to enable richer semantic understanding:
CLIP sees: "masterpiece, 1girl, blue hair, school uniform, smile"
T5 sees: "masterpiece, 1girl, blue hair, school uniform, smile ΒΆ A cheerful schoolgirl with blue hair smiling warmly"
The pilcrow (ΒΆ) separator acts as a mode-switch token.
Learned Parameters
Alpha (Visibility):
- clip_g: 0.7316
- clip_l: 0.7316
- t5_xl_g: 0.7339
- t5_xl_l: 0.7451
Beta (Capacity):
- clip_l_t5_xl_l: 0.5709
- clip_g_t5_xl_g: 0.5763
Usage
from lyra_xl_multimodal import load_lyra_from_hub
model = load_lyra_from_hub("AbstractPhil/vae-lyra-xl-adaptive-cantor-illustrious")
model.eval()
inputs = {
"clip_l": clip_l_embeddings, # [batch, 77, 768]
"clip_g": clip_g_embeddings, # [batch, 77, 1280]
"t5_xl_l": t5_xl_embeddings, # [batch, 512, 2048]
"t5_xl_g": t5_xl_embeddings # [batch, 512, 2048]
}
recons, mu, logvar, _ = model(inputs, target_modalities=["clip_l", "clip_g"])
Files
model.pt- Full checkpoint (model + optimizer + scheduler)checkpoint_lyra_illustrious_XXXX.pt- Step checkpointsconfig.json- Training configurationweights/lyra_illustrious_best.safetensors- Best model weights onlyweights/lyra_illustrious_step_XXXX.safetensors- Step checkpoints (weights only)
- Downloads last month
- 562
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support