SmolLM3 pretraining datasets Collection datasets used in SmolLM3 pretraining • 15 items • Updated Aug 12, 2025 • 47
david-thrower/codelion-finemix-pdf-dclm-edu-1024-seq-len-15897-samples Viewer • Updated Jan 19 • 15.9k • 31
david-thrower/codelion-finemix-pdf-dclm-edu-1024-seq-len-15897-samples Viewer • Updated Jan 19 • 15.9k • 31