Honestly one of the most important missions in open-source AI. The world is cheering you on!!
Markus PRO
AI & ML interests
NLP
Recent Activity
liked
a model
26 minutes ago
cerebras/MiniMax-M2.1-REAP-139B-A10B
liked
a model
26 minutes ago
stepfun-ai/Step-3.5-Flash
liked
a model
about 21 hours ago
unsloth/GLM-4.7-Flash-REAP-23B-A3B-GGUF
Organizations
replied to
danielhanchen's
post
3 days ago
reacted to
danielhanchen's
post with ๐ฅ๐
3 days ago
Post
3287
You can now run Kimi K2.5 locally! ๐ฅ
We shrank the 1T model to 240GB (-60%) via Dynamic 1-bit.
Get >40 tok/s on 242GB or 622GB VRAM/RAM for near full precision.
GGUF: unsloth/Kimi-K2.5-GGUF
Guide: https://unsloth.ai/docs/models/kimi-k2.5
We shrank the 1T model to 240GB (-60%) via Dynamic 1-bit.
Get >40 tok/s on 242GB or 622GB VRAM/RAM for near full precision.
GGUF: unsloth/Kimi-K2.5-GGUF
Guide: https://unsloth.ai/docs/models/kimi-k2.5
replied to
danielhanchen's
post
3 days ago
No Daniel, I cannot run Kimi K2.5 locally. Do i look like i'm rich? ๐ญ
Post
2300
Dear Hugging Face team, can we please have a way to archive hf repositories / spaces? I have a bunch of spaces that used to work but don't any more due to the hf space implementations changing and i think it would be good if I could archive those like in GitHub.
React to this post if you want to see this feature! ๐ก
React to this post if you want to see this feature! ๐ก
posted
an
update
3 days ago
Post
2300
Dear Hugging Face team, can we please have a way to archive hf repositories / spaces? I have a bunch of spaces that used to work but don't any more due to the hf space implementations changing and i think it would be good if I could archive those like in GitHub.
React to this post if you want to see this feature! ๐ก
React to this post if you want to see this feature! ๐ก
reacted to
danielhanchen's
post with ๐ฅ๐ฅ
13 days ago
Post
2591
Run GLM-4.7-Flash locally on your device with 24GB RAM!๐ฅ
It's the best performing 30B model on SWE-Bench and GPQA. With 200K context, it excels at coding, agents, chat & reasoning.
GGUF: unsloth/GLM-4.7-Flash-GGUF
Guide: https://unsloth.ai/docs/models/glm-4.7-flash
It's the best performing 30B model on SWE-Bench and GPQA. With 200K context, it excels at coding, agents, chat & reasoning.
GGUF: unsloth/GLM-4.7-Flash-GGUF
Guide: https://unsloth.ai/docs/models/glm-4.7-flash
posted
an
update
13 days ago
Post
2798
Inspired by the heroes of day zero quants (
@TheBloke
@danielhanchen
@shimmyshimmer
@bartowski
), I decided to join the race by releasing the first FP8 quant of glm-4.7-flash! Not as easy as i expected, but I'm happy i was still able to have it working within a few hours after the original model was released! Interested in feedback if anyone wants to try it out!
marksverdhei/GLM-4.7-Flash-FP8
Note: If my PR to vLLM isn't merged yet you might have to use my fork. Cheers! ๐ค
marksverdhei/GLM-4.7-Flash-FP8
Note: If my PR to vLLM isn't merged yet you might have to use my fork. Cheers! ๐ค
reacted to
tomaarsen's
post with ๐ฅ
about 2 months ago
Post
3482
๐ฆโ๐ฅ I've just published Sentence Transformers v5.2.0! It introduces multi-processing for CrossEncoder (rerankers), multilingual NanoBEIR evaluators, similarity score outputs in mine_hard_negatives, Transformers v5 support and more. Details:
- CrossEncoder multi-processing: Similar to SentenceTransformer and SparseEncoder, you can now use multi-processing with CrossEncoder rerankers. Useful for multi-GPU and CPU settings, and simple to configure: just
- Multilingual NanoBEIR Support: You can now use community translations of the tiny NanoBEIR retrieval benchmark instead of only the English one, by passing
- Similarity scores in Hard Negatives Mining: When mining for hard negatives to create a strong training dataset, you can now pass
- Transformers v5: This release works with both Transformers v4 and the upcoming v5. In the future, Sentence Transformers will only work with Transformers v5, but not yet!
- Python 3.9 deprecation: Now that Python 3.9 has lost security support, Sentence Transformers no longer supports it.
Check out the full changelog for more details: https://github.com/huggingface/sentence-transformers/releases/tag/v5.2.0
I'm quite excited about what's coming. There's a huge draft PR with a notable refactor in the works that should bring some exciting support. Specifically, better multimodality, rerankers, and perhaps some late interaction in the future!
- CrossEncoder multi-processing: Similar to SentenceTransformer and SparseEncoder, you can now use multi-processing with CrossEncoder rerankers. Useful for multi-GPU and CPU settings, and simple to configure: just
device=["cuda:0", "cuda:1"] or device=["cpu"]*4 on the model.predict or model.rank calls.- Multilingual NanoBEIR Support: You can now use community translations of the tiny NanoBEIR retrieval benchmark instead of only the English one, by passing
dataset_id, e.g. dataset_id="lightonai/NanoBEIR-de" for the German benchmark.- Similarity scores in Hard Negatives Mining: When mining for hard negatives to create a strong training dataset, you can now pass
output_scores=True to get similarity scores returned. This can be useful for some distillation losses!- Transformers v5: This release works with both Transformers v4 and the upcoming v5. In the future, Sentence Transformers will only work with Transformers v5, but not yet!
- Python 3.9 deprecation: Now that Python 3.9 has lost security support, Sentence Transformers no longer supports it.
Check out the full changelog for more details: https://github.com/huggingface/sentence-transformers/releases/tag/v5.2.0
I'm quite excited about what's coming. There's a huge draft PR with a notable refactor in the works that should bring some exciting support. Specifically, better multimodality, rerankers, and perhaps some late interaction in the future!
reacted to
tomaarsen's
post with โค๏ธ
5 months ago
Post
5780
ModernBERT goes MULTILINGUAL! One of the most requested models I've seen, The Johns Hopkins University's CLSP has trained state-of-the-art massively multilingual encoders using the ModernBERT architecture: mmBERT.
Model details:
- 2 model sizes:
- jhu-clsp/mmBERT-small
- jhu-clsp/mmBERT-base
- Uses the ModernBERT architecture, but with the Gemma2 multilingual tokenizer (so: flash attention, alternating global/local attention, unpadding/sequence packing, etc.)
- Maximum sequence length of 8192 tokens, on the high end for encoders
- Trained on 1833 languages using DCLM, FineWeb2, and many more sources
- 3 training phases: 2.3T tokens pretraining on 60 languages, 600B tokens mid-training on 110 languages, and 100B tokens decay training on all 1833 languages.
- Both models are MIT Licensed, and the full datasets and intermediary checkpoints are also publicly released
Evaluation details:
- Very competitive with ModernBERT at equivalent sizes on English (GLUE, MTEB v2 English after finetuning)
- Consistently outperforms equivalently sized models on all Multilingual tasks (XTREME, classification, MTEB v2 Multilingual after finetuning)
- In short: beats commonly used multilingual base models like mDistilBERT, XLM-R (multilingual RoBERTa), multilingual MiniLM, etc.
- Additionally: the ModernBERT-based mmBERT is much faster than the alternatives due to its architectural benefits. Easily up to 2x throughput in common scenarios.
Check out the full blogpost with more details. It's super dense & gets straight to the point: https://huggingface.co/blog/mmbert
Based on these results, mmBERT should be the new go-to multilingual encoder base models at 300M and below. Do note that the mmBERT models are "base" models, i.e. they're currently only trained to perform Mask Filling. They'll need to be finetuned for downstream tasks like semantic search, classification, clustering, etc.
Model details:
- 2 model sizes:
- jhu-clsp/mmBERT-small
- jhu-clsp/mmBERT-base
- Uses the ModernBERT architecture, but with the Gemma2 multilingual tokenizer (so: flash attention, alternating global/local attention, unpadding/sequence packing, etc.)
- Maximum sequence length of 8192 tokens, on the high end for encoders
- Trained on 1833 languages using DCLM, FineWeb2, and many more sources
- 3 training phases: 2.3T tokens pretraining on 60 languages, 600B tokens mid-training on 110 languages, and 100B tokens decay training on all 1833 languages.
- Both models are MIT Licensed, and the full datasets and intermediary checkpoints are also publicly released
Evaluation details:
- Very competitive with ModernBERT at equivalent sizes on English (GLUE, MTEB v2 English after finetuning)
- Consistently outperforms equivalently sized models on all Multilingual tasks (XTREME, classification, MTEB v2 Multilingual after finetuning)
- In short: beats commonly used multilingual base models like mDistilBERT, XLM-R (multilingual RoBERTa), multilingual MiniLM, etc.
- Additionally: the ModernBERT-based mmBERT is much faster than the alternatives due to its architectural benefits. Easily up to 2x throughput in common scenarios.
Check out the full blogpost with more details. It's super dense & gets straight to the point: https://huggingface.co/blog/mmbert
Based on these results, mmBERT should be the new go-to multilingual encoder base models at 300M and below. Do note that the mmBERT models are "base" models, i.e. they're currently only trained to perform Mask Filling. They'll need to be finetuned for downstream tasks like semantic search, classification, clustering, etc.
reacted to
MonsterMMORPG's
post with ๐ฅ
about 1 year ago
Post
2008
FLUX Redux is a hidden Gem
I am still doing huge research to publish an amazing fully Public - no paywalled Tutorial, but this is generated via SwarmUI
Style Model Merge Strength : 0.5
FLUX Guidance Scale is : 6
Used base model is my FLUX fine tuned model with 256 images via Kohya SS GUI as shown in tutorial ( https://youtu.be/FvpWy1x5etM ) - 70 epoch
Prompt : anime ohwx man walking in a jungle <segment:yolo-face_yolov9c.pt-1,0.7,0.5> ohwx man, anime
I am still doing huge research to publish an amazing fully Public - no paywalled Tutorial, but this is generated via SwarmUI
Style Model Merge Strength : 0.5
FLUX Guidance Scale is : 6
Used base model is my FLUX fine tuned model with 256 images via Kohya SS GUI as shown in tutorial ( https://youtu.be/FvpWy1x5etM ) - 70 epoch
Prompt : anime ohwx man walking in a jungle <segment:yolo-face_yolov9c.pt-1,0.7,0.5> ohwx man, anime
reacted to
mlabonne's
post with ๐ฅ
over 1 year ago
Post
19586
โ๏ธ Uncensor any LLM with abliteration
I wrote an article about abliteration and how NeuralDaredevil-8B was created. Beyond removing alignment, I believe it's an interesting technique with a lot of potential. It's basically fine-tuning without retraining.
In this article, we see how it works, implement it in Google Colab, and heal the abliterated model to recover the performance drop due to this technique. The final model is an uncensored and high-quality model with the highest MMLU score on the Open LLM Leaderboard (8B category).
https://huggingface.co/blog/mlabonne/abliteration
I wrote an article about abliteration and how NeuralDaredevil-8B was created. Beyond removing alignment, I believe it's an interesting technique with a lot of potential. It's basically fine-tuning without retraining.
In this article, we see how it works, implement it in Google Colab, and heal the abliterated model to recover the performance drop due to this technique. The final model is an uncensored and high-quality model with the highest MMLU score on the Open LLM Leaderboard (8B category).
https://huggingface.co/blog/mlabonne/abliteration