marksverdhei (Markus)

replied to danielhanchen's post 3 days ago

Honestly one of the most important missions in open-source AI. The world is cheering you on!!

reacted to danielhanchen's post with 🔥🚀 3 days ago

Post

3287

You can now run Kimi K2.5 locally! 🔥

We shrank the 1T model to 240GB (-60%) via Dynamic 1-bit.
Get >40 tok/s on 242GB or 622GB VRAM/RAM for near full precision.

GGUF: unsloth/Kimi-K2.5-GGUF

Guide: https://unsloth.ai/docs/models/kimi-k2.5

7 replies

·

replied to danielhanchen's post 3 days ago

No Daniel, I cannot run Kimi K2.5 locally. Do i look like i'm rich? 😭

reacted to their post with 🤗 3 days ago

Post

2300

Dear Hugging Face team, can we please have a way to archive hf repositories / spaces? I have a bunch of spaces that used to work but don't any more due to the hf space implementations changing and i think it would be good if I could archive those like in GitHub.

React to this post if you want to see this feature! 💡

posted an update 3 days ago

Post

2300

Dear Hugging Face team, can we please have a way to archive hf repositories / spaces? I have a bunch of spaces that used to work but don't any more due to the hf space implementations changing and i think it would be good if I could archive those like in GitHub.

React to this post if you want to see this feature! 💡

reacted to danielhanchen's post with 🔥🔥 13 days ago

Post

2591

Run GLM-4.7-Flash locally on your device with 24GB RAM!🔥

It's the best performing 30B model on SWE-Bench and GPQA. With 200K context, it excels at coding, agents, chat & reasoning.

GGUF: unsloth/GLM-4.7-Flash-GGUF

Guide: https://unsloth.ai/docs/models/glm-4.7-flash

posted an update 13 days ago

Post

2798

Inspired by the heroes of day zero quants ( @TheBloke @danielhanchen @shimmyshimmer @bartowski ), I decided to join the race by releasing the first FP8 quant of glm-4.7-flash! Not as easy as i expected, but I'm happy i was still able to have it working within a few hours after the original model was released! Interested in feedback if anyone wants to try it out!

marksverdhei/GLM-4.7-Flash-FP8

Note: If my PR to vLLM isn't merged yet you might have to use my fork. Cheers! 🤗

posted an update 29 days ago

Post

336

Hey if you're reading this and happen to be one of the guys training frontier llms, please penalize 404 urls in your reward functions. Happens too often that these models memorize / make up non-existing url paths and get away with it

1 reply

·

reacted to tomaarsen's post with 🔥 about 2 months ago

Post

3482

🐦‍🔥 I've just published Sentence Transformers v5.2.0! It introduces multi-processing for CrossEncoder (rerankers), multilingual NanoBEIR evaluators, similarity score outputs in mine_hard_negatives, Transformers v5 support and more. Details:

- CrossEncoder multi-processing: Similar to SentenceTransformer and SparseEncoder, you can now use multi-processing with CrossEncoder rerankers. Useful for multi-GPU and CPU settings, and simple to configure: just device=["cuda:0", "cuda:1"] or device=["cpu"]*4 on the model.predict or model.rank calls.

- Multilingual NanoBEIR Support: You can now use community translations of the tiny NanoBEIR retrieval benchmark instead of only the English one, by passing dataset_id, e.g. dataset_id="lightonai/NanoBEIR-de" for the German benchmark.

- Similarity scores in Hard Negatives Mining: When mining for hard negatives to create a strong training dataset, you can now pass output_scores=True to get similarity scores returned. This can be useful for some distillation losses!

- Transformers v5: This release works with both Transformers v4 and the upcoming v5. In the future, Sentence Transformers will only work with Transformers v5, but not yet!

- Python 3.9 deprecation: Now that Python 3.9 has lost security support, Sentence Transformers no longer supports it.

Check out the full changelog for more details: https://github.com/huggingface/sentence-transformers/releases/tag/v5.2.0

I'm quite excited about what's coming. There's a huge draft PR with a notable refactor in the works that should bring some exciting support. Specifically, better multimodality, rerankers, and perhaps some late interaction in the future!

reacted to tomaarsen's post with ❤️ 5 months ago

Post

5780

ModernBERT goes MULTILINGUAL! One of the most requested models I've seen, The Johns Hopkins University's CLSP has trained state-of-the-art massively multilingual encoders using the ModernBERT architecture: mmBERT.

Model details:
- 2 model sizes:
- jhu-clsp/mmBERT-small
- jhu-clsp/mmBERT-base
- Uses the ModernBERT architecture, but with the Gemma2 multilingual tokenizer (so: flash attention, alternating global/local attention, unpadding/sequence packing, etc.)
- Maximum sequence length of 8192 tokens, on the high end for encoders
- Trained on 1833 languages using DCLM, FineWeb2, and many more sources
- 3 training phases: 2.3T tokens pretraining on 60 languages, 600B tokens mid-training on 110 languages, and 100B tokens decay training on all 1833 languages.
- Both models are MIT Licensed, and the full datasets and intermediary checkpoints are also publicly released

Evaluation details:
- Very competitive with ModernBERT at equivalent sizes on English (GLUE, MTEB v2 English after finetuning)
- Consistently outperforms equivalently sized models on all Multilingual tasks (XTREME, classification, MTEB v2 Multilingual after finetuning)
- In short: beats commonly used multilingual base models like mDistilBERT, XLM-R (multilingual RoBERTa), multilingual MiniLM, etc.
- Additionally: the ModernBERT-based mmBERT is much faster than the alternatives due to its architectural benefits. Easily up to 2x throughput in common scenarios.

Check out the full blogpost with more details. It's super dense & gets straight to the point: https://huggingface.co/blog/mmbert

Based on these results, mmBERT should be the new go-to multilingual encoder base models at 300M and below. Do note that the mmBERT models are "base" models, i.e. they're currently only trained to perform Mask Filling. They'll need to be finetuned for downstream tasks like semantic search, classification, clustering, etc.

reacted to MonsterMMORPG's post with 🔥 about 1 year ago

Post

2008

FLUX Redux is a hidden Gem

I am still doing huge research to publish an amazing fully Public - no paywalled Tutorial, but this is generated via SwarmUI

Style Model Merge Strength : 0.5

FLUX Guidance Scale is : 6

Used base model is my FLUX fine tuned model with 256 images via Kohya SS GUI as shown in tutorial ( https://youtu.be/FvpWy1x5etM ) - 70 epoch

Prompt : anime ohwx man walking in a jungle <segment:yolo-face_yolov9c.pt-1,0.7,0.5> ohwx man, anime

4 replies

·

reacted to mlabonne's post with 🔥 over 1 year ago

Post

19586

✂️ Uncensor any LLM with abliteration

I wrote an article about abliteration and how NeuralDaredevil-8B was created. Beyond removing alignment, I believe it's an interesting technique with a lot of potential. It's basically fine-tuning without retraining.

In this article, we see how it works, implement it in Google Colab, and heal the abliterated model to recover the performance drop due to this technique. The final model is an uncensored and high-quality model with the highest MMLU score on the Open LLM Leaderboard (8B category).

https://huggingface.co/blog/mlabonne/abliteration

26 replies

·

Markus PRO

AI & ML interests

Recent Activity

Organizations

Markus PRO

AI & ML interests

Recent Activity

Organizations

marksverdhei's activity