OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration Paper • 2602.05400 • Published Feb 5 • 347
view article Article Interactively explore your Huggingface dataset with one line of code +2 Oct 25, 2023 • 2
Is There a Better Source Distribution than Gaussian? Exploring Source Distributions for Image Flow Matching Paper • 2512.18184 • Published Dec 20, 2025 • 21
Running on CPU Upgrade Featured 3.04k The Smol Training Playbook 📚 3.04k The secrets to building world-class LLMs
GoGiants1/smolvlm-sft-llava665k_lr1e-5_vlr1e-5_clr1e-5_bs8_ga4_eff128_archived 0.3B • Updated Dec 8, 2025
GoGiants1/smolvlm-sft-llava665k_lr1e-5_vlr1e-5_clr1e-5_bs8_ga4_eff128_archived 0.3B • Updated Dec 8, 2025