Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

danielhanchen 
posted an update 2 days ago
tomaarsen 
posted an update 2 days ago
view post
Post
2240
🐦‍🔥 I've just published Sentence Transformers v5.2.0! It introduces multi-processing for CrossEncoder (rerankers), multilingual NanoBEIR evaluators, similarity score outputs in mine_hard_negatives, Transformers v5 support and more. Details:

- CrossEncoder multi-processing: Similar to SentenceTransformer and SparseEncoder, you can now use multi-processing with CrossEncoder rerankers. Useful for multi-GPU and CPU settings, and simple to configure: just device=["cuda:0", "cuda:1"] or device=["cpu"]*4 on the model.predict or model.rank calls.

- Multilingual NanoBEIR Support: You can now use community translations of the tiny NanoBEIR retrieval benchmark instead of only the English one, by passing dataset_id, e.g. dataset_id="lightonai/NanoBEIR-de" for the German benchmark.

- Similarity scores in Hard Negatives Mining: When mining for hard negatives to create a strong training dataset, you can now pass output_scores=True to get similarity scores returned. This can be useful for some distillation losses!

- Transformers v5: This release works with both Transformers v4 and the upcoming v5. In the future, Sentence Transformers will only work with Transformers v5, but not yet!

- Python 3.9 deprecation: Now that Python 3.9 has lost security support, Sentence Transformers no longer supports it.

Check out the full changelog for more details: https://github.com/huggingface/sentence-transformers/releases/tag/v5.2.0

I'm quite excited about what's coming. There's a huge draft PR with a notable refactor in the works that should bring some exciting support. Specifically, better multimodality, rerankers, and perhaps some late interaction in the future!
KingNish 
posted an update 2 days ago
view post
Post
2030
Muon vs MuonClip vs Muon+Adamw

Muon has gone from an experiment to a mainstream optimizer, but does it hold up for fine‑tuning? We ran head‑to‑head tests on Qwen3‑4B (10k+ high‑quality instruction rows) to find out.

Short story: Pure Muon converged fastest at the start, but its gradient‑norm spikes made training unstable. MuonClip (Kimi K2’s clipping) stabilizes long pretraining runs, yet in our small‑scale fine‑tune it underperformed, lower token accuracy and slower convergence. The winner was the hybrid: Muon for 2D layers + AdamW for 1D layers. It delivered the best balance of stability and final performance and even beat vanilla AdamW.

Takeaway: for small-scale fine-tuning, hybrid = practical and reliable.

Next Step: scale to larger models/datasets to see if Muon’s spikes become catastrophic or if clipping wins out.

Full Blog Link: https://huggingface.co/blog/KingNish/optimizer-part1
martinsu 
posted an update 2 days ago
view post
Post
1304
I wasted days on a GPU node on a bug that shouldn't exist

So I was fine-tuning TildeOPEN-30B and the outputs were... weird. Token ID 179 (<0x00>) kept appearing between almost every token pair. Took me a bit to figure out what was going on.

Turns out I used the fast tokenizer for training, but the model was trained on the slow one. Silent failure.

Well... long story short—TGI uses (forces) the fast tokenizer, no questions asked. And you'll have agile's kryptonite: silent failure. If the model was trained on slow, it's a silent disaster.

I got curious and wrote a quick script to check how common this is. Ran it on 6,014 LLM HF models overnight.

Roughly 10% of HF model downloads have mismatched tokenizers. Not all mismatches are catastrophic, but some are brutal — like chat template markers inflating from 1 token to 3, silently wrecking context windows and causing model act weird.

This wasn't rigorous research, but the drift is real. And the worst part? 968 models(out of 500+ downloads) have both fast and slow tokenizers present, but they still produce different outputs. No missing files, no errors — just silent degradation.

TGI defaults to the fast tokenizer, as does AutoTokenizer.from_pretrained(). If a fast tokenizer doesn't exist, it auto-generates one. If your model was trained on slow, you get silent degradation. Output looks fine; the model just performs worse. Sometimes really worse. You'd never know.

If model was trained on fast tokenizer, its fine, but how do You know?

The root cause? Either model authors run HF conversion and upload both without verifying, or users run TGI, which always forces(converts to) fast .

The result of this fight with tokenizers is martinsu/tildeopen-30b-mu-instruct

It's based on TildeOPEN-30B (a solid EU HPC multilingual base). Nothing fancy—just a proper instruction fine-tune where I didn't mess up the tokenizer this time.

Full article: https://github.com/martins-u/tokenmagedon
  • 1 reply
·
sergiopaniego 
posted an update 2 days ago
view post
Post
1452
TRL now includes agent training support for GRPO‼️

Train 🕵️ agents with 🔧 tools, enabling interaction with external functions and APIs.

And of course, a new notebook and scripts to get you up to speed

📘 notebook tutorial: https://github.com/huggingface/trl/blob/main/examples/notebooks/grpo_agent.ipynb

📂 script examples: https://github.com/huggingface/trl/blob/main/examples/scripts/grpo_agent.py

📦 TRL v0.26.0 release: https://github.com/huggingface/trl/releases/tag/v0.26.0
  • 2 replies
·
ovi054 
posted an update 2 days ago
view post
Post
2084
Z-Image Turbo + LoRA ⚡

ovi054/Z-Image-LORA

Z-Image Turbo is the No. 1 trending Text-to-Image model right now. You can add a custom LoRA and generate images with this Space.

👉 Try it now: ovi054/Z-Image-LORA
  • 3 replies
·
sergiopaniego 
posted an update 3 days ago
view post
Post
2687
ICYMI, you can fine-tune open LLMs using Claude Code

just tell it:
“Fine-tune Qwen3-0.6B on open-r1/codeforces-cots”

and Claude submits a real training job on HF GPUs using TRL.

it handles everything:
> dataset validation
> GPU selection
> training + Trackio monitoring
> job submission + cost estimation
when it’s done, your model is on the Hub, ready to use

read more about the process: https://huggingface.co/blog/hf-skills-training
IliaLarchenko 
posted an update 3 days ago
view post
Post
923
🏆 BEHAVIOR Challenge 1st Place – Solution Summary

My team recently won 1st place in the BEHAVIOR Challenge at NeurIPS.
The competition focused on training a single policy to complete 50 long-horizon household tasks in simulation.

We built an end-to-end policy based on Pi0.5 with a bunch of custom modifications. Everything is open-sourced, and it should be useful for anyone exploring VLAs or adapting them to specific tasks.

Key Architecture Changes:
- Replaced language model with 50 trainable task embeddings (no text at all)
- Correlated noise for Flow Matching: ϵ ∼ N(0, 0.5I + 0.5Σ) using dataset action covariance
- Learnable mixed-layer attention: each action expert layer attends to a trainable mix of all VLM layers
- System 2 stage tracking: model predicts task stage, we smooth it with voting and feed it back as context

Training:
- Multi-sample Flow Matching: 15 FM samples per VLM pass to reduce gradient variance
- Delta action space + per-timestamp normalization
- FAST auxiliary loss and stage prediction loss
- Trained on 224×224 RGB + proprioception only
- We use 4 fine-tuned checkpoints, all derived from a multi-task model trained on all 50 tasks

Inference Optimizations:
- Soft inpainting: predict 30 actions, execute 26, use 4 as an input for the next chunk
- Correlation-aware guidance of inpainting to keep action chunks smooth
- 1.3× speedup via cubic spline compression
- General correction rule: reopen gripper after failed grasps

🔗 Code and Models:
- Code: https://github.com/IliaLarchenko/behavior-1k-solution
- Weights: IliaLarchenko/behavior_submission
- Paper: Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge (2512.06951)
XiangpengYang 
posted an update about 8 hours ago
view post
Post
121
🚀 Introducing VideoCoF: Unified Video Editing with a Temporal Reasoner (Chain-of-Frames)!

We’re excited to introduce VideoCoF, a unified framework for instruction-based video editing that enables temporal reasoning and ~4× video length extrapolation, trained with only 50k video pairs. 🔥

🔍 What makes VideoCoF different?
🧠 Chain-of-Frames reasoning , mimic human thinking process like Seeing → Reasoning → Editing to apply edits accurately over time without external masks, ensuring physically plausible results.
📈 Strong length generalization — trained on 33-frame clips, yet supports multi-shot editing and long-video extrapolation (~4×).
🎯 Unified fine-grained editing — Object Removal, Addition, Swap, and Local Style Transfer, with instance-level & part-level, spatial-aware control.

⚡ Fast inference update
🚀 H100: ~20s / video with 4-step inference, making high-quality video editing far more practical for real-world use.

🔗 Links
📄 Paper: https://arxiv.org/abs/2512.07469
💻 Code: https://github.com/knightyxp/VideoCoF
🤗 Demo: XiangpengYang/VideoCoF
🧩 Models: XiangpengYang/VideoCoF
🌐 Project Page: https://videocof.github.io/

#VideoEditing #DiffusionModels #GenerativeAI #ComputerVision #AI
  • 3 replies
·
YatharthS 
posted an update about 14 hours ago
view post
Post
518
I just released LayaCodec, a highly efficient neural audio tokenizer/codec for TTS models, far better than most previous audio tokenizers.

🤯 Next-gen TTS models that use this could achieve several 100s of times real-time speed while producing clearer audio!! 🤯

GitHub repo: https://github.com/ysharma3501/LayaCodec
Model: YatharthS/LayaCodec