RedHatAI/Qwen3-Next-80B-A3B-Instruct-quantized.w4a16
Text Generation • 12B • Updated • 802 • 3
OpenSource and AI
SNLP: Layer-Parallel Inference via Structured Newton Corrections
S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation