AI & ML interests

Local LLMs

Recent Activity

OzTianluย 
posted an update 1 day ago
view post
Post
166
๐Ÿšจ URGENT: To the 13k+ users downloading Kai-3B-Instruct โ€” Please update to v1.1! (Official Q8_0 GGUF inside)
OzTianlu/Kai-3B-Instruct-Q8_0-GGUF
Wow. Waking up to see over 13,000 combined downloads for the Kai-3B-Instruct GGUFs is absolutely mind-blowing. Thank you so much to the community and to the awesome creators ( @SimplySara & @mradermacher ) for the auto-quantization!
However, we have a slight "suffering from success" situation here. ๐Ÿ˜…
โš ๏ธ THE ISSUE: You are likely running the v1.0 "Logic-Poisoned" weights.
If your model is acting like a cold, emotionless robot that only replies with a rigid Analysis -> Approach -> Solution template even when you just say "Hello", you have v1.0. In our initial release, the model overfitted to its reasoning corpus, causing a complete "conversational mode collapse."
๐Ÿš€ THE FIX: Official v1.1 is Live!
We have completed a 4000-step annealing phase to restore its sanity.
  • 2 replies
ยท
OzTianluย 
posted an update 3 days ago
view post
Post
1669
Scaling UP in Kai! ๐ŸŒŠ
NoesisLab/Kai-3B-Instruct

Introducing NoesisLab/Kai-3B-Instruct What happens when you force a 3B model to reason entirely in its latent space ?
Meet Kai-3B, our latest industrial-grade reasoning model fine-tuned using the Adaptive Dual Search (ADS) algorithm.
GSM8K (0-shot, Direct Answer): 39.27% ๐Ÿคฏ (Llama-2-7B is ~14.6%)
HumanEval (Pass@1): 39.02% ๐Ÿ’ป (Overtakes Gemma-2-2B's 30%)
MMLU (5-shot): 53.62% ๐Ÿ“š (Crushing the 50% barrier)
ARC-Challenge: 51.88%๐ŸŽฏ
PIQA: 77.53%
HellaSwag: 69.53%
Kai-3B proves that reasoning density doesn't strictly require parameter bloat or verbose generation. It acts as a perfect, cold-blooded Agent action-engineโ€”ideal for JSON routing, SWE-bench patch generation, and anywhere you need absolute structured certainty without token waste.
  • 2 replies
ยท
OzTianluย 
posted an update 4 days ago
view post
Post
1515
๐Ÿ›ก๏ธ Meet Spartacus-1B: Shattering the Memory Wall with True O(1) Inference! ๐Ÿš€
NoesisLab/Spartacus-1B-Instruct
NoesisLab/ChatSpartacus
At NoesisLab, we've entirely ripped out Softmax Attention and replaced it with Causal Monoid State Compression.
Say hello to Spartacus-1B-Instruct (1.3B) ๐Ÿ—ก๏ธ.
Instead of maintaining a massive, ever-growing list of past tokens, Spartacus compresses its entire causal history into a fixed-size state matrix per head. The result?
โšก True O(1) Inference: Memory footprint and generation time per token remain absolutely constant, whether you are on token 10 or token 100,000.
๐Ÿง  Explicit Causality: We threw away RoPE and attention masks. The model learns when to forget using dynamic, content-aware vector decay.
๐Ÿ”ฅ Blazing Fast Training: Full hardware utilization via our custom Triton-accelerated JIT parallel prefix scan.
๐Ÿ“Š Zero-Shot Benchmarks that Hit Hard:
O(1) architectures usually sacrifice zero-shot accuracy. Not Spartacus. It is punching way above its weight class, beating established sub-quadratic models (like Mamba-1.4B and RWKV-6-1.6B):
๐Ÿ† ARC-Challenge: 0.3063 (vs Mamba 0.284)
๐Ÿ† ARC-Easy: 0.5518
๐Ÿ† PIQA: 0.6915
prithivMLmodsย 
posted an update 4 days ago
view post
Post
2167
FireRed-Image-Edit-1.0 (Rapid) Fast Experimental Demo is Out! ๐Ÿš€๐Ÿค—

Demo: prithivMLmods/FireRed-Image-Edit-1.0-Fast

-> Paired the EditPlusPipeline with the Diffusers-compatible transformer weights of Rapid AIO from Qwen-Image-Edit. (experimental)
-> This fusion delivers more accurate instruction following, higher image quality, and consistent visual coherence @ 4-step fast inference.
-> Better maintains text styles with high fidelity, along with high-quality old photo restoration, enhancement, and best-in-class virtual try-on.

Ujjwal-Tyagiย 
posted an update 5 days ago
view post
Post
2771
Public reports allege that Anthropic gobbled up trillions of tokens of copyrighted material and public data to build their castle. ๐Ÿฐ๐Ÿ“„ Now that they're sitting on top, they're begging for special laws to protect their profits while pulling the ladder up behind them. ๐Ÿชœ๐Ÿšซ

But the hypocrisy meter just broke! ๐Ÿ“‰ They are accusing Chinese labs like DeepSeek, Minimax, and Kimi of "huge distillation attacks. The Reality is that You can't just loot the entire internet's library, lock the door, and then sue everyone else for reading through the window. Stop trying to gatekeep the tech you didn't own in the first place. Read the complete article on it: https://huggingface.co/blog/Ujjwal-Tyagi/the-dark-underbelly-of-anthropic
  • 3 replies
ยท
guifre103ย 
published a model 8 days ago
prithivMLmodsย 
posted an update 9 days ago
OzTianluย 
posted an update 12 days ago
view post
Post
3455
O(1) inference is the foundational design of Spartacus-1B-Instruct ๐Ÿ›ก๏ธ !

NoesisLab/Spartacus-1B-Instruct

We have successfully replaced the KV-cache bottleneck inherent in Softmax Attention with Causal Monoid State Compression. By defining the causal history as a monoid recurrence, , the entire prefix is lossily compressed into a fixed-size state matrix per head.

The technical core of this architecture relies on the associativity of the monoid operator:

Training: parallel prefix scan using Triton-accelerated JIT kernels to compute all prefix states simultaneously.
Inference: True sequential updates. Memory and time complexity per token are decoupled from sequence length.
Explicit Causality: We discard RoPE and attention masks. Causality is a first-class citizen, explicitly modeled through learned, content-dependent decay gates.

Current zero-shot benchmarks demonstrate that Spartacus-1B-Instruct (1.3B) is already outperforming established sub-quadratic models like Mamba-1.4B and RWKV-6-1.6B on ARC-Challenge (0.3063). Recent integration of structured Chain-of-Thought (CoT) data has further pushed reasoning accuracy to 75%.

The "Spartacus" era is about scaling intelligence, not the memory wall โ™พ๏ธ.
prithivMLmodsย 
posted an update 13 days ago
view post
Post
2554
Dropping the Qwen3 VL Series of Unredacted MAX-VL models. These models have undergone multi-stage training to minimize refusal rates through continuous abliterated optimization. You can find the models in BF16, FP8-Dynamic, and GGUF formats at the links below.๐Ÿ”ฅ๐Ÿš€

Unredacted MAX - VL:
โžœ prithivMLmods/Qwen3-VL-4B-Instruct-Unredacted-MAX
โžœ prithivMLmods/Qwen3-VL-4B-Thinking-Unredacted-MAX
โžœ prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX
โžœ prithivMLmods/Qwen3-VL-8B-Thinking-Unredacted-MAX

Unredacted MAX - VL [FP8]
โžœ prithivMLmods/Qwen3-VL-4B-Instruct-Unredacted-MAX-FP8
โžœ prithivMLmods/Qwen3-VL-4B-Thinking-Unredacted-MAX-FP8
โžœ prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX-FP8
โžœ prithivMLmods/Qwen3-VL-8B-Thinking-Unredacted-MAX-FP8

Unredacted MAX - VL [GGUF]
โžœ prithivMLmods/Qwen3-VL-4B-Instruct-Unredacted-MAX-GGUF
โžœ prithivMLmods/Qwen3-VL-4B-Thinking-Unredacted-MAX-GGUF
โžœ prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX-GGUF
โžœ prithivMLmods/Qwen3-VL-8B-Thinking-Unredacted-MAX-GGUF

Unredacted MAX - VL [Collection]
โžœ https://huggingface.co/collections/prithivMLmods/unredacted-max-vl-fp8
โžœ https://huggingface.co/collections/prithivMLmods/unredacted-max-vl
โžœ https://huggingface.co/collections/prithivMLmods/unredacted-max-vl-gguf

To learn more, visit the app page or the respective model pages.
Ujjwal-Tyagiย 
posted an update 13 days ago
view post
Post
206
Qwen 3.5 Model is here! Supporting 1m context length by default, It is giving much good performance and competitive to Claude Opus 4.6, Qwen/Qwen3.5-397B-A17B, here it's GGUF: unsloth/Qwen3.5-397B-A17B-GGUF, Follow me and turn on the notification for the latest news!
Ujjwal-Tyagiย 
posted an update 17 days ago
view post
Post
2996
GLM 5 is insane, it ranks #4 Globally!
ยท
OzTianluย 
posted an update 18 days ago
view post
Post
867
๐Ÿš€ NanoHammer-1.5B-Instruct:
https://huggingface.co/NoesisLab/NanoHammer-1.5B-Instruct
We are excited to introduce NanoHammer, a novel architecture by NoesisLab designed for Causal State Compression and true Linear Inference Complexity.
๐Ÿง  The Core: Holographic State SpaceForget the growing KV Cache. NanoHammer leverages Holographic Rotary Embeddings to compress sequence history into a dynamic integral state.
Polynomial Compression: Instead of storing raw history, we "integrate" context into a complex number space , treating memory as a container of evolving polynomial coefficients.
Dynamic Evolution: The architecture features a custom StateUpdateCell that uses Euler method fixed-point iteration, allowing the model to perform implicit reasoning via differential state updates.
โšก Why It Matters: Efficiency Meets Reasoning O(1) Inference Memory: State size remains constant regardless of sequence length.Causal Modeling: Explicitly models the causal flow of logic through time, perfect for "implicit reasoning" tasks without the verbosity of Chain-of-Thought.1.5B Lightweight Design: High performance, low resource footprint.
๐Ÿ›  Model Card HighlightsType: nanohammer (Hybrid Causal-State Architecture)
License: Apache 2.0
Capabilities: Instruction following, Long-context handling
๐Ÿ”— Try it on Hugging Face: https://huggingface.co/NoesisLab/NanoHammer-1.5B-Instruct
  • 1 reply
ยท
Parveshiiiiย 
posted an update 18 days ago
view post
Post
272
Introducing Seekify โ€” a truly nonโ€‘rateโ€‘limiting search library for Python

Tired of hitting rate limits when building search features? Iโ€™ve built Seekify, a lightweight Python library that lets you perform searches without the usual throttling headaches.

๐Ÿ”น Key highlights

- Simple API โ€” plug it in and start searching instantly

- No rateโ€‘limiting restrictions

- Designed for developers who need reliable search in projects, scripts, or apps

๐Ÿ“ฆ Available now on PyPI:

pip install seekify

๐Ÿ‘‰ Check out the repo: https:/github.com/Parveshiiii/Seekify
Iโ€™d love feedback, contributions, and ideas for realโ€‘world use cases. Letโ€™s make search smoother together!
MaziyarPanahiย 
posted an update 19 days ago
view post
Post
2092
Announcing: OpenMed Multilingual PII Detection Models

Today I am releasing 105 open-source models for Personally Identifiable Information (PII) detection in French, German, and Italian.

All Apache 2.0 licensed. Free for commercial use. No restrictions.

Performance:

- French: 97.97% F1 (top model)
- German: 97.61% F1 (top model)
- Italian: 97.28% F1 (top model)

All top-10 models per language exceed 96% F1

Coverage:

55+ PII entity types per language
Native ID formats: NSS (French), Sozialversicherungsnummer (German), Codice Fiscale (Italian)
Language-specific address, phone, and name patterns

Training Data:

French: 49,580 samples
German: 42,250 samples
Italian: 40,944 samples

Why Multilingual?

European healthcare operates in European languages. Clinical notes, patient records, and medical documents are generated in French, German, Italian, and other languages.

Effective de-identification requires:

- Native language understanding โ€” not translation
- Local ID format recognition โ€” each country has unique patterns
- Cultural context awareness โ€” names, addresses, and formats vary
- These models deliver production-ready accuracy without requiring data to leave your infrastructure or language.

HIPAA & GDPR Compliance
Built for US and European privacy regulations:

- On-premise deployment: Process data locally with zero external dependencies
- Data sovereignty: No API calls, no cloud services, no cross-border transfers
- Air-gapped capable: Deploy in fully isolated environments if required
- Regulatory-grade accuracy: Supporting Expert Determination standards
- HIPAA and GDPR compliance across languages, without compliance gaps.

Use Cases
- Hospital EHR systems: Automated patient record de-identification
- Clinical research: Multilingual dataset preparation for studies
- Insurance companies: Claims processing across

https://huggingface.co/collections/OpenMed/multilingual-pii-and-de-identification
  • 1 reply
ยท
prithivMLmodsย 
posted an update 20 days ago
view post
Post
2923
Introducing FLUX.2-Klein-LoRA-Studio, a demo for image editing using specialized LoRA adapters built for the FLUX.2-Klein-Distilled model. It features an edit-style gallery for multi-style image editing, including de-light, face swap, mannequin, and more. Try the demo below.

๐Ÿค—Demo: prithivMLmods/FLUX.2-Klein-LoRA-Studio
๐Ÿค—Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection
๐Ÿค—GitHub: https://github.com/PRITHIVSAKTHIUR/FLUX.2-Klein-LoRA-Studio

To learn more, visit the app page or the respective model pages.
MaziyarPanahiย 
posted an update 21 days ago
view post
Post
1227
From Golden Gate Bridge to Broken JSON: Why Anthropic's SAE Steering Fails for Structured Output

I ran 6 experiments trying to use Anthropic's SAE steering for JSON generation.

- Base model: 86.8% valid JSON
- Steering only: 24.4%
- Fine-tuned: 96.6%
- FSM constrained: 100%

Steering is for semantics, not syntax.

https://huggingface.co/blog/MaziyarPanahi/sae-steering-json
MaziyarPanahiย 
posted an update 23 days ago
view post
Post
3965
๐Ÿšจ Day 8/8: OpenMed Medical Reasoning Dataset Release - THE GRAND FINALE

Today I complete my 8-day release series with Medical-Reasoning-SFT-Mega.
The largest open medical reasoning dataset, combining 7 state-of-the-art AI models with fair distribution deduplication.

THE 7 SOURCE MODELS (Original Sample Counts):

1. Trinity-Mini: 810,284 samples
2. Qwen3-Next-80B: 604,249 samples
3. GPT-OSS-120B: 506,150 samples
4. Nemotron-Nano-30B: 444,544 samples
5. GLM-4.5-Air: 225,179 samples
6. MiniMax-M2.1: 204,773 samples
7. Baichuan-M3-235B: 124,520 samples

TOTAL BEFORE DEDUPLICATION: 2,919,699 samples

TOKEN COUNTS:
- Content tokens: 2.22 Billion
- Reasoning tokens: 1.56 Billion
- Total tokens: 3.78 Billion
- Samples with chain-of-thought: 100%

Quick Start:
from datasets import load_dataset
ds = load_dataset("OpenMed/Medical-Reasoning-SFT-Mega")


All datasets Apache 2.0 licensed. Free for research and commercial use.

Thank you for following OpenMed's release series. I can't wait to see what you build. ๐Ÿ”ฅ

OpenMed/Medical-Reasoning-SFT-Mega
OpenMed/Medical-Reasoning-SFT-GPT-OSS-120B-V2
OpenMed/Medical-Reasoning-SFT-Trinity-Mini
OpenMed/Medical-Reasoning-SFT-GLM_4.5_Air
OpenMed/Medical-Reasoning-SFT-MiniMax-M2.1
OpenMed/Medical-Reasoning-SFT-Qwen3-Next-80B
OpenMed/Medical-Reasoning-SFT-Nemotron-Nano-30B
https://huggingface.co/datasets/OpenMed/Medical-Reasonin

https://huggingface.co/collections/OpenMed/medical-datasets
ยท