AudioSep-hive
Model Description
AudioSep-hive is a data-efficient, query-based universal sound separation model trained on the Hive dataset. By leveraging the high-quality, semantically consistent Hive dataset, this model achieves competitive separation accuracy and perceptual quality comparable to state-of-the-art models (such as SAM-Audio) while utilizing only a fraction (~0.2%) of the training data volume.
This model is developed by Shanda AI Research Tokyo and is introduced in the paper: A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation.
Model Details
- Model Type:​ Query-Based Universal Sound Separation
- Language(s):​ English (for text queries)
- License:​ Apache 2.0 (Please update if different)
- Trained on:​ ShandaAI/Hive (2,442 hours of raw audio, 19.6M mixtures)
- Paper:​ arXiv:2601.22599
- Code Repository:​ GitHub - ShandaAI/Hive
Uses
The model is intended for universal sound separation tasks, allowing users to extract specific sounds from complex audio mixtures using multimodal prompts (e.g., text descriptions or audio queries).
- Downloads last month
- -