ByteDance-Seed/Stable-DiffCoder-8B-Instruct
Text Generation • 8B • Updated
• 703 • 126
None defined yet.
Mixture-of-Depths Attention
Understanding by Reconstruction: Reversing the Software Development Process for LLM Pretraining