Soundboard

Schrödinger Bridge denoiser fine-tuned for musical recording audio restoration — recovers a soundboard-style mix from heavily-corrupted audience recordings (room reverb + audience-mic blend + lossy codec artifacts).

Fine-tuned from NVIDIA's A2SB (twosplit_0.5_1.0 split) on a synthetic-corruption training pipeline driven by profile-based augmentation — corruption parameters are calibrated from real (clean, festival-recording) pairs and sampled at training time from the recovered distribution. See Locutius for the full corruption chain, profiling, and training scaffold.

Quick facts

Architecture AttnUNetF (565.5M params)
Audio format 44.1 kHz, 2-channel, 32-bit float
Segment length 130560 samples (2.96 s)
STFT n_fft=2048, hop=512, window=hann
Representation 3-channel [mag^0.25, cos(phase), sin(phase)]
Trained at step 50,000
Base checkpoint NVIDIA A2SB twosplit_0.5_1.0
Checkpoint size 2.1 GB
Diffusion Schrödinger Bridge, β_max=1.0

Usage

Load with the Locutius training package:

import torch
from huggingface_hub import hf_hub_download
from locutius_train.config import TrainConfig
from locutius_train.network import AttnUNetF, SinusoidalTemporalEmbedding
from locutius_train.diffusion import Diffusion
from locutius_train.representation import WaveformToInput, InputToWaveform
from locutius_train.restore import restore_spectrogram

ckpt_path = hf_hub_download(repo_id="protodotdesign/Soundboard", filename="model.pt")
sd = torch.load(ckpt_path, map_location="cuda", weights_only=False)

cfg = TrainConfig()
model = AttnUNetF(
    n_updown_levels=cfg.model.n_updown_levels,
    in_channels=cfg.model.in_channels,
    hidden_channels=list(cfg.model.hidden_channels),
    out_channels=cfg.model.out_channels,
    emb_channels=cfg.diffusion.n_timestep_channels,
    band_embedding_dim=cfg.model.band_embedding_dim,
    n_attn_heads=cfg.model.n_attn_heads,
    attention_levels=list(cfg.model.attention_levels),
    use_attn_input_norm=cfg.model.use_attn_input_norm,
    num_res_blocks=cfg.model.num_res_blocks,
).to("cuda").eval()
model.load_state_dict(sd["model"])

See restore.py in the Locutius repo for a complete CLI that takes a clean source, applies the calibrated festival-corruption profile, and runs the reverse Schrödinger Bridge to produce a restored output.

Calibrated corruption profile

This model was trained against a single calibrated profile recovered from a real (studio FLAC, festival M4A) pair via per-kick local Wiener deconvolution. The profile is bundled in profile.json:

{
"name": "edc_festival",
"ir_path": "../impulses/EchoThief/Brutalism/San Diego Supercomputer Center Outdoor Patio California.wav",
"delay_ms_range": [
15.0,
25.0
],
"studio_gain_range": [
0.6,
0.7
],
"room_gain_range": [
0.55,
0.65
]
}

Each training-step corruption draws fresh values from these ranges, so the model has been exposed to ~50,000 distinct delay/blend combinations within the same venue character.

Training data

Trained on a focused subset of electronic music FLACs. No festival recordings or other licensed audio were stored or distributed — only the studio source material was used; festival-corrupted versions were synthesized on-the-fly from the calibrated profile during each training step.

Limitations

  • Single profile: trained against one calibrated venue (edc_festival). Performance on festival recordings from very different venues / mix chains will degrade.
  • Electronic music bias: training set was EDM-heavy. Restoration quality on rock, classical, or vocal-led material may be uneven.
  • No crowd-noise model: the calibrated profile didn't include additive crowd-noise (no real crowd recordings were available during calibration). Recordings with heavy crowd vocals may have residual artifacts.
  • Non-commercial use only — see the license below.

License

Dual non-commercial license:

You must comply with both licenses. Use is restricted to research and evaluation only — no commercial use is permitted. See LICENSING.md for the full plain-English breakdown.

Citation

If you use this model in research, please cite the upstream A2SB paper and reference this fine-tune:

@misc{soundboard,
  title={Soundboard: festival audio restoration via profile-calibrated Schrödinger Bridge fine-tuning},
  author={Locutius},
  year={2026},
  howpublished={\url{https://huggingface.co/protodotdesign/Soundboard}},
}
Downloads last month
21
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support