AudioSep-hive

Model Description

AudioSep-hive is a data-efficient, query-based universal sound separation model trained on the Hive dataset. By leveraging the high-quality, semantically consistent Hive dataset, this model achieves competitive separation accuracy and perceptual quality comparable to state-of-the-art models (such as SAM-Audio) while utilizing only a fraction (~0.2%) of the training data volume.

This model is developed by Shanda AI Research Tokyo and is introduced in the paper: A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation.

Model Details

Model Type: Query-Based Universal Sound Separation
Language(s): English (for text queries)
License: Apache 2.0 (Please update if different)
Trained on: ShandaAI/Hive (2,442 hours of raw audio, 19.6M mixtures)
Paper: arXiv:2601.22599
Code Repository: GitHub - ShandaAI/Hive

Uses

The model is intended for universal sound separation tasks, allowing users to extract specific sounds from complex audio mixtures using multimodal prompts (e.g., text descriptions or audio queries).

Downloads last month: 73

Dataset used to train ShandaAI/AudioSep-hive

Spaces using ShandaAI/AudioSep-hive 2

Paper for ShandaAI/AudioSep-hive

A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation

Paper • 2601.22599 • Published Jan 30 • 5