ISTA-DASLab/Mistral-Small-3.1-24B-Instruct-2503-GPTQ-4b-128g
Image-Text-to-Text
•
5B
•
Updated
•
495
•
17
None defined yet.
DASH: Faster Shampoo via Batched Block Preconditioning and Efficient Inverse-Root Solvers
Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation