Soundboard

Schrödinger Bridge denoiser fine-tuned for musical recording audio restoration — recovers a soundboard-style mix from heavily-corrupted audience recordings (room reverb + audience-mic blend + lossy codec artifacts).

Fine-tuned from NVIDIA's A2SB (twosplit_0.5_1.0 split) on a synthetic-corruption training pipeline driven by profile-based augmentation — corruption parameters are calibrated from real (clean, festival-recording) pairs and sampled at training time from the recovered distribution. See Locutius for the full corruption chain, profiling, and training scaffold.

Quick facts


Architecture	AttnUNetF (565.5M params)
Audio format	44.1 kHz, 2-channel, 32-bit float
Segment length	130560 samples (2.96 s)
STFT	n_fft=2048, hop=512, window=hann
Representation	3-channel `[mag^0.25, cos(phase), sin(phase)]`
Trained at step	50,000
Base checkpoint	NVIDIA A2SB `twosplit_0.5_1.0`
Checkpoint size	2.1 GB
Diffusion	Schrödinger Bridge, β_max=1.0

Usage

Load with the Locutius training package:

import torch
from huggingface_hub import hf_hub_download
from locutius_train.config import TrainConfig
from locutius_train.network import AttnUNetF, SinusoidalTemporalEmbedding
from locutius_train.diffusion import Diffusion
from locutius_train.representation import WaveformToInput, InputToWaveform
from locutius_train.restore import restore_spectrogram

ckpt_path = hf_hub_download(repo_id="protodotdesign/Soundboard", filename="model.pt")
sd = torch.load(ckpt_path, map_location="cuda", weights_only=False)

cfg = TrainConfig()
model = AttnUNetF(
    n_updown_levels=cfg.model.n_updown_levels,
    in_channels=cfg.model.in_channels,
    hidden_channels=list(cfg.model.hidden_channels),
    out_channels=cfg.model.out_channels,
    emb_channels=cfg.diffusion.n_timestep_channels,
    band_embedding_dim=cfg.model.band_embedding_dim,
    n_attn_heads=cfg.model.n_attn_heads,
    attention_levels=list(cfg.model.attention_levels),
    use_attn_input_norm=cfg.model.use_attn_input_norm,
    num_res_blocks=cfg.model.num_res_blocks,
).to("cuda").eval()
model.load_state_dict(sd["model"])

See restore.py in the Locutius repo for a complete CLI that takes a clean source, applies the calibrated festival-corruption profile, and runs the reverse Schrödinger Bridge to produce a restored output.

Calibrated corruption profile

This model was trained against a single calibrated profile recovered from a real (studio FLAC, festival M4A) pair via per-kick local Wiener deconvolution. The profile is bundled in profile.json:

{
"name": "edc_festival",
"ir_path": "../impulses/EchoThief/Brutalism/San Diego Supercomputer Center Outdoor Patio California.wav",
"delay_ms_range": [
15.0,
25.0
],
"studio_gain_range": [
0.6,
0.7
],
"room_gain_range": [
0.55,
0.65
]
}

Each training-step corruption draws fresh values from these ranges, so the model has been exposed to ~50,000 distinct delay/blend combinations within the same venue character.

Training data

Trained on a focused subset of electronic music FLACs. No festival recordings or other licensed audio were stored or distributed — only the studio source material was used; festival-corrupted versions were synthesized on-the-fly from the calibrated profile during each training step.

Limitations

Single profile: trained against one calibrated venue (edc_festival). Performance on festival recordings from very different venues / mix chains will degrade.
Electronic music bias: training set was EDM-heavy. Restoration quality on rock, classical, or vocal-led material may be uneven.
No crowd-noise model: the calibrated profile didn't include additive crowd-noise (no real crowd recordings were available during calibration). Recordings with heavy crowd vocals may have residual artifacts.
Non-commercial use only — see the license below.

License

Dual non-commercial license:

NVIDIA Source Code License for A2SB (the upstream license inherited from the A2SB base checkpoint)
PolyForm Noncommercial 1.0.0 (additional terms on top, source-availability + patent retaliation)

You must comply with both licenses. Use is restricted to research and evaluation only — no commercial use is permitted. See LICENSING.md for the full plain-English breakdown.

Citation

If you use this model in research, please cite the upstream A2SB paper and reference this fine-tune:

@misc{soundboard,
  title={Soundboard: festival audio restoration via profile-calibrated Schrödinger Bridge fine-tuning},
  author={Locutius},
  year={2026},
  howpublished={\url{https://huggingface.co/protodotdesign/Soundboard}},
}

Downloads last month: 21