David Golchinfar PRO

DavidGF

https://vago-solutions.ai

AI & ML interests

finetune llms, improve german language understanding and generated text of llms

Recent Activity

liked a model about 19 hours ago

DataScience-UIBK/Reason-mxbai-colbert-v0-32m

reacted to anakin87's post with ❤️ 1 day ago

A small model that struggled against a random opponent now beats GPT-5-mini at tic-tac-toe I took https://huggingface.co/LiquidAI/LFM2-2.6B and trained it through play. 🧑‍🍳 Here's how: 1️⃣ Build a solid RL env with Verifiers (Prime Intellect) 2️⃣ Generate synthetic data: <200 games sampled from GPT-5-mini playing in the env 3️⃣ SFT warm-up to teach format 4️⃣ Group-based RL (CISPO) against opponents making 20-70% random moves 5️⃣ RL again with stronger opponents (0-25% random moves) + 1.25 temperature to push exploration and shake off suboptimal strategies Done! Beats GPT-5-mini 🏆 --- 🎮 Play against the model: https://huggingface.co/spaces/anakin87/LFM2-2.6B-mr-tictactoe 🤗 Model: https://huggingface.co/anakin87/LFM2-2.6B-mr-tictactoe 📚 Walkthrough/course: https://github.com/anakin87/llm-rl-environments-lil-course 🤗 Dataset and checkpoints: https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

liked a Space 1 day ago

anakin87/LFM2-2.6B-mr-tictactoe

View all activity

Organizations

liked a model about 19 hours ago

DataScience-UIBK/Reason-mxbai-colbert-v0-32m

reacted to anakin87's post with ❤️ 1 day ago

Post

2204

A small model that struggled against a random opponent now beats GPT-5-mini at tic-tac-toe

I took LiquidAI/LFM2-2.6B and trained it through play.

🧑‍🍳 Here's how:

1️⃣ Build a solid RL env with Verifiers (Prime Intellect)
2️⃣ Generate synthetic data: <200 games sampled from GPT-5-mini playing in the env
3️⃣ SFT warm-up to teach format
4️⃣ Group-based RL (CISPO) against opponents making 20-70% random moves
5️⃣ RL again with stronger opponents (0-25% random moves) + 1.25 temperature to push exploration and shake off suboptimal strategies

Done! Beats GPT-5-mini 🏆

---

🎮 Play against the model: anakin87/LFM2-2.6B-mr-tictactoe

🤗 Model: anakin87/LFM2-2.6B-mr-tictactoe

📚 Walkthrough/course: https://github.com/anakin87/llm-rl-environments-lil-course

🤗 Dataset and checkpoints: https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

liked a Space 1 day ago

Mr. Tic Tac Toe

⭕

Play Tic Tac Toe against a small RL tuned model

liked 2 models 3 days ago

lightonai/DenseOn

lightonai/LateOn

reacted to anakin87's post with ❤️ 14 days ago

Post

4141

🌀 Let LLMs wander - Engineering RL Environments

Reinforcement Learning Environments are little worlds
where models can act, get rewards, and learn.

I've been exploring how to design them, figuring out what works and what doesn't.

If you want to learn how to build them, I recorded a practical intro video.

You'll also see how to turn Liquid AI LFM2-2.6B into a Tic-tac-toe master 🙂

🎥 Engineering RL Environments video: https://www.youtube.com/watch?v=71V3fTaUp2Q

---

🌱 LLM RL Environments Lil Course: https://github.com/anakin87/llm-rl-environments-lil-course

🤗🕹️ Play against the trained model: anakin87/LFM2-2.6B-mr-tictactoe

📚 HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe