AI & ML interests

None defined yet.

Recent Activity

KingNish 
posted an update 1 day ago
view post
Post
1224
Muon vs MuonClip vs Muon+Adamw

Muon has gone from an experiment to a mainstream optimizer, but does it hold up for fine‑tuning? We ran head‑to‑head tests on Qwen3‑4B (10k+ high‑quality instruction rows) to find out.

Short story: Pure Muon converged fastest at the start, but its gradient‑norm spikes made training unstable. MuonClip (Kimi K2’s clipping) stabilizes long pretraining runs, yet in our small‑scale fine‑tune it underperformed, lower token accuracy and slower convergence. The winner was the hybrid: Muon for 2D layers + AdamW for 1D layers. It delivered the best balance of stability and final performance and even beat vanilla AdamW.

Takeaway: for small-scale fine-tuning, hybrid = practical and reliable.

Next Step: scale to larger models/datasets to see if Muon’s spikes become catastrophic or if clipping wins out.

Full Blog Link: https://huggingface.co/blog/KingNish/optimizer-part1
KingNish 
posted an update 3 days ago
ZennyKenny 
posted an update 6 days ago
view post
Post
188
What a trip. Just walked through @burtenshaw and @evalstate tutorial on adding Hugging Face Skills to your Claude Code agent so you can fine tune LLMs by chatting with AI.

These are the kinds of innovations that are going to help everyone benefit from the power of Artificial Intelligence. Well done gentlemen and thank you for sharing.
  • 1 reply
·
ZennyKenny 
posted an update 12 days ago
view post
Post
245
😐 I keep seeing takes on LinkedIn from American business influencers melting down about Silicon Valley startup "dependence" on open-source Chinese models.

🤔 Can anyone describe a credible scenario where these models can be leveraged by the Chinese government to endanger American security interests or am I right to believe that this is just Red Scare nonsense?
  • 2 replies
·
ZennyKenny 
posted an update 22 days ago
view post
Post
421
The #feedback channel of app early access Slack Workspaces is some of the best unintentional comedy material I have ever come across tbh.
jjokah 
posted an update 23 days ago
ZennyKenny 
posted an update 24 days ago
view post
Post
3144
🎉 Wow. Congratulations @bfirsh and the Replicate team on the CloudFlare acquisition!

✌️ You've really built an incredible ecosystem and product offering and should be super proud.
ZennyKenny 
posted an update about 1 month ago
view post
Post
330
🎉 Novoyaz is live.

A few months ago, I built a quick POC in Hugging Face that used a fine-tuned variant of OpenAI's OSS-20B model that I trained to convert the text from pre-reform Russian-language documents into modern Russian orthography.

⚡️ This morning, I launched novoyaz.io.

This is a production app, the frontend for which I built in like two hours with Lovable, that uses that same fine-tuned model for transliteration, but now has a bunch of extra features that make using it even easier (like taking and uploading pictures with your on-device camera for example 😅).

👉 If you're a researcher, or know a researcher, for whom this app will improve their day-to-day workflows, please get in touch with me.
atasoglu 
posted an update about 1 month ago
view post
Post
1353
Introducing ToolsGen 🛠️

I built a tool to solve a problem I kept running into: creating quality datasets for training LLMs to use tools.

ToolsGen takes your JSON tool definitions and automatically generates realistic user requests, corresponding tool calls, and evaluates them using an LLM-as-a-judge pipeline. It outputs datasets ready to use with Hugging Face.

What makes it useful:
- Generates realistic user requests + tool calls from JSON definitions
- LLM-as-a-judge quality scoring with multi-dimensional rubrics
- Multiple sampling strategies (random, parameter-aware, semantic)
- OpenAI-compatible API support
- Outputs JSONL with train/val splits

Still early days (API isn't stable yet), but it's already helping me generate tool-calling datasets much faster.

Check it out: https://github.com/atasoglu/toolsgen

Happy to hear feedback or ideas!
ZennyKenny 
posted an update about 1 month ago
view post
Post
343
Anyone got the scoop on a good OCR model that's available on inference?

Keen to make use of an endpoint (gated or not -- happy to pay for usage) for a personal project, but not so keen to pay for the GPU hosting myself.

🙈🙈🙈
  • 4 replies
·
nouamanetazi 
posted an update about 1 month ago
view post
Post
3985
After training 𝐒𝐦𝐨𝐥𝐋𝐌𝟑 on 𝟑𝟖𝟒 𝐇𝟏𝟎𝟎𝐬 for nearly a month, I've come to realize something most people overlook: 𝐢𝐧𝐟𝐫𝐚𝐬𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞 𝐢𝐬 𝐭𝐡𝐞 𝐦𝐚𝐤𝐞-𝐨𝐫-𝐛𝐫𝐞𝐚𝐤 𝐟𝐚𝐜𝐭𝐨𝐫 𝐢𝐧 𝐋𝐋𝐌 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠. 🔥

Everyone talks about model architecture and data quality. And yes, those matter immensely. But here's what nobody tells you: when your training run fails at 2 AM because of mysterious 𝐍𝐂𝐂𝐋 𝐞𝐫𝐫𝐨𝐫𝐬, or when your expensive GPU cluster is running at 𝟔𝟎% 𝐞𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐜𝐲, the problem isn't your model. It's most probably a 𝐦𝐢𝐬𝐮𝐬𝐞 𝐨𝐟 𝐭𝐡𝐞 𝐡𝐚𝐫𝐝𝐰𝐚𝐫𝐞. 🛠️

Questions that seemed simple but had no clear answers: Why is 𝐌𝐨𝐄 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐬𝐥𝐨𝐰𝐞𝐫 𝐭𝐡𝐚𝐧 𝐝𝐞𝐧𝐬𝐞 𝐦𝐨𝐝𝐞𝐥𝐬? Which 𝐍𝐂𝐂𝐋 𝐟𝐥𝐚𝐠𝐬 should we actually set? How often should we checkpoint without killing throughput?

That's why we built 𝐓𝐡𝐞 𝐒𝐦𝐨𝐥 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐏𝐥𝐚𝐲𝐛𝐨𝐨𝐤 📖: a complete guide covering everything from model architecture and data curation to the SmolLM3 training marathon, post-training techniques, and crucially, the 𝐢𝐧𝐟𝐫𝐚𝐬𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞 𝐥𝐚𝐲𝐞𝐫 that most teams get wrong.

We validated real vs theoretical bandwidth across the entire stack: 𝐇𝐁𝐌𝟑 𝐡𝐢𝐭𝐭𝐢𝐧𝐠 𝟑 𝐓𝐁/𝐬, 𝐍𝐕𝐋𝐢𝐧𝐤 𝟒.𝟎 𝐫𝐞𝐚𝐜𝐡𝐢𝐧𝐠 𝟕𝟖𝟔 𝐆𝐁/𝐬, 𝐏𝐂𝐈𝐞 𝐆𝐞𝐧𝟒 𝐚𝐭 𝟏𝟒.𝟐 𝐆𝐁/𝐬. Then we ran collective operations across 𝟏𝟐𝟖 𝐆𝐏𝐔𝐬 (16 nodes, 8xH100s each) and measured how performance degrades at scale: all-reduce drops from 𝟒𝟖𝟎 𝐆𝐁/𝐬 on a single node to 𝟑𝟐𝟎-𝟑𝟓𝟎 𝐆𝐁/𝐬 across 16 nodes.

If you've ever wondered why your training runs are slower than they should be, or you're planning to scale up and want to avoid expensive mistakes, this guide might save you weeks of debugging.

𝐓𝐡𝐞 𝐒𝐦𝐨𝐥 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐏𝐥𝐚𝐲𝐛𝐨𝐨𝐤: https://lnkd.in/e5MKXUHS

Shared with ❤️ by the HuggingFace team
meg 
posted an update about 1 month ago
view post
Post
3767
🤖 Did you know your voice might be cloned without your consent from just *one sentence* of audio?
That's not great. So with @frimelle , we brainstormed a new idea for developers who want to curb malicious use: ✨The Voice Consent Gate.✨
Details, code, here: https://huggingface.co/blog/voice-consent-gate
  • 3 replies
·
ZennyKenny 
posted an update about 2 months ago
ZennyKenny 
posted an update about 2 months ago
view post
Post
2174
Did Hugging Face just ban hammer a bunch of bot accounts or am I just so uninteresting that 30% of my subs dropped me overnight?

😬 Wait, don't answer that.
  • 2 replies
·
ZennyKenny 
posted an update about 2 months ago