Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

danielhanchenΒ 
posted an update 2 days ago
RakshitAralimattiΒ 
posted an update 1 day ago
view post
Post
2001
Just built my entire AI Engineer portfolio by pasting 2 links (GitHub and LinkedIn) into
moonshotai
Kimi 2.5.
That's it. That's the workflow.
Zero coding. Zero iteration. Zero "make the button bigger."
See for yourself: https://rakshit2020.github.io/rakshitaralimatti.github.io/

The model:
βœ… Scraped my GitHub repos automatically
βœ… Pulled my experience from LinkedIn
βœ… Designed an Aurora Glass theme
βœ… Mapped every skill to projects
βœ… Added animations I'd never code myself


Β·
IlyasMoutawwakilΒ 
posted an update 3 days ago
view post
Post
2803
Transformers v5 just landed! πŸš€
It significantly unifies and reduces modeling code across architectures, while opening the door to a whole new class of performance optimizations.

My favorite new feature? πŸ€”
The new dynamic weight loader + converter. Here’s why πŸ‘‡

Over the last few months, the core Transformers maintainers built an incredibly fast weight loader, capable of converting tensors on the fly while loading them in parallel threads. This means we’re no longer constrained by how parameters are laid out inside the safetensors weight files.

In practice, this unlocks two big things:
- Much more modular modeling code. You can now clearly see how architectures build on top of each other (DeepSeek v2 β†’ v3, Qwen v2 β†’ v3 β†’ MoE, etc.). This makes shared bottlenecks obvious and lets us optimize the right building blocks once, for all model families.
- Performance optimizations beyond what torch.compile can do alone. torch.compile operates on the computation graph, but it can’t change parameter layouts. With the new loader, we can restructure weights at load time: fusing MoE expert projections, merging attention QKV projections, and enabling more compute-dense kernels that simply weren’t possible before.

Personally, I'm honored to have contributed in this direction, including the work on optimizing MoE implementations and making modeling code more torch-exportable, so these optimizations can be ported cleanly across runtimes.

Overall, Transformers v5 is a strong signal of where the community and industry are converging: Modularity and Performance, without sacrificing Flexibility.

Transformers v5 makes its signature from_pretrained an entrypoint where you can mix and match:
- Parallelism
- Quantization
- Custom kernels
- Flash/Paged attention
- Continuous batching
- ...

Kudos to everyone involved! I highly recommend the:
Release notes: https://github.com/huggingface/transformers/releases/tag/v5.0.0
Blog post: https://huggingface.co/blog/transformers-v5
Β·
prithivMLmodsΒ 
posted an update about 19 hours ago
alvarobarttΒ 
posted an update 1 day ago
view post
Post
2017
πŸ’₯ hf-mem v0.4.1 now also estimates KV cache memory requirements for any context length and batch size with the --experimental flag!

uvx hf-mem --model-id ... --experimental will automatically pull the required information from the Hugging Face Hub to include the KV cache estimation, when applicable.

πŸ’‘ Alternatively, you can also set the --max-model-len, --batch-size and --kv-cache-dtype arguments (Γ  la vLLM) manually if preferred.
  • 1 reply
Β·
AdinaYΒ 
posted an update 3 days ago
view post
Post
1164
Big day in open source AI!!

✨ DeepSeek released OCR2 πŸ’₯
deepseek-ai/DeepSeek-OCR-2

✨ Kimi K2.5 just landed πŸ”₯
moonshotai/Kimi-K2.5

With the Chinese Spring Festival 3 weeks away,

what’s coming next?πŸ‘€
kostakoffΒ 
posted an update 3 days ago
view post
Post
714
I created list of models based on permissive license (apache2, mit, openrail) and raw fp16 weights.
LLM:
- Mistral 7b v1
- Falcon 7b
- GLM4 9b
- Olmo3 7b
- Yi 9b
- Qwen3 8b
- Internlm3 8B
- PHI4
Multimodal LLM:
- Pixtral 12b
- Qwen3-VL-8B-Instruct
Picture generation:
- Stable Diffusion 1.5
- Stable Diffusion 2.0
- Stable Diffusion XL
Video generation:
- WAN 2.1 VACE Diffusers
TTS:
- SUNO Bark

This can be very useful for those who are just starting their AI LLM journey in PyTorch, like me.
Suggestions in the comments are welcome.
JavedalamΒ 
posted an update about 18 hours ago
view post
Post
680
KittenTTS Nano β€” Tiny, Expressive, Practical

KittenTTS Nano is a lightweight, CPU-only text-to-speech model designed to prove that natural, expressive voices don’t require massive cloud stacks or GPUs. At roughly ~15M parameters, it runs fast on modest hardware, supports multiple expressive voices, and exposes simple controls for pacing and tone. This makes it ideal for edge devices, demos, and anyone who wants full control over TTS without latency, lock-in, or infrastructure overhead.

Try it here

Javedalam/KittenTTS

The model page

KittenML/kitten-tts-nano-0.2
sergiopaniegoΒ 
posted an update 2 days ago
OzTianluΒ 
posted an update about 19 hours ago
view post
Post
687
πŸš€ Geilim-1B-Instruct β€” Implicit Deep Reasoning, Zero Verbosity
NoesisLab/Geilim-1B-Instruct
https://huggingface.co/collections/NoesisLab/geilim-large-language-models
No <think> tags. No long CoT.
Reasoning happens inside the hidden states, not in the output.
What’s different
🧠 Implicit reasoning: deep causal reasoning without exposing chains
πŸ•ΈοΈ ASPP (Adjacency-Structured Parallel Propagation): parent-only causal graph, O(n) message passing
🌊 Ο€-flow: internal probability-space refinement instead of token-level deliberation
βš–οΈ Hybrid gating: learns when to use structure vs attention
Why it matters
Lower latency & token cost
Cleaner, production-ready outputs
CoT-level reasoning depth without verbosity tax
Built on Llama-3.2-1B-Instruct, trained for math, logic, and commonsense.
Designed for small-model reasoning at the edge.
#ImplicitReasoning #SmallLLM #EfficientAI #ReasoningModels #ASPP #PiFlow