Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
appvoid 
posted an update 4 days ago
Post
141
As an advocate for small language models I just want to say. It might not actually be the end for small models. We are just getting started! Now that we have super good models we can find creative ways to replicate the behavior at small scale!

I'll show you in a few weeks what a small model is capable of, you will surprised.

I applaud you in your journey into the void with small models. I too am deeply fascinated with the optimization of smaller models rather than asking for more parameters and terabytes of scraped internet data. I hope to see what you've come up with in a few weeks time.

I just finished designing a sparsity training scheduler that trains on average 35% of a models available weights with almost no hidden dimensions between transformers adjoined and zero throughput while randomizing trainable locations. It cuts VRAM and training time down and the models set higher benchmarks on mathematics than FFT models trained on the same corpus. I discovered this while fucking around for fun.

I don't doubt the discoveries to be made with training smaller architectures have many more surprises in store for us.

In this post