This model does not contain MTP layers ; you need to run at non-MTP.
David Belton PRO
AI & ML interests
Recent Activity
Organizations
As of this writing:
There are pipeline (issues as well as optimizations) issues still currently, and it is not widely supported in some AI Apps.
Specifically:
Ggufs:
- Imatrix is not yet supported for MTP.
- Not all AI apps have updated to support it -> result -> MTP ggufs do not work at all.
- Misc issues with speed still being worked on.
Training is compounded by number of experts in the model, which adds a serious level of time to the training.
Even 1000 samples [small!] takes 6-12 hrs.
Consider 31B dense , same samples, 30-60 minutes.
I will add to the list; may wait for specific Heretic and/or tuned version.
I already have a 43B-A3B version running in the lab ; however tuning these sparse moe models take a lot more work/time and ahh... detail. AND a lot more VRAM!!! [can't compress these atm, so BF16 required => 100 GB+ ]
Tuned 27B Heretic Uncensored quants from IQ2M to Q8.
IQ2M is 83% of BF16, with Q6 just under 98% of BF16 precision.
Q8: 98.47% of BF16 precision.
NEO/Code DI-Imatrix Quants.
Exceeds all 5 metrics for "censored" quants too.
All metrics posted.
Tuned model -from which the quants were built- also exceeds Qwen 3.6 27B core metrics too.
DavidAU/Qwen3.6-27B-Heretic-Uncensored-FINETUNE-NEO-CODE-Di-IMatrix-MAX-GGUF
I may make a Q6 high and/or a Q8 Hybrid and/or Q8 "HI".
Imatrix does not have any affect on Q8 or BF16 ; unless the other tensors in the model are set at Q6 or lower.
A Q8 "HI" is a special case; where one or more tensors/layers are set at BF16.
All quants benchmarked with 5 key metrics.
A DAVIDAU vs UNSLOTH Metrics showdown.
Quant quality exceeds Unsloth in key metrics.
IQ2_M to Q6 available.
Standout: IQ4XS at 94% of BF16 precision.
Full explainer for Quant metrics.
DavidAU/Qwen3.6-27B-NEO-CODE-Di-IMatrix-MAX-GGUF
Currently working with Qwen 3.5/6 35B-A3B in the lab ; learning the "quirks" ; still a ways to go.
I noticed the chat template got updated, and tried it on the E4B, with surprising results in stabilizing the brainwave.
quant arc arc/e boolq hswag obkqa piqa wino
mxfp8 0.480,0.656,0.797,0.608,0.400,0.755,0.665
mxfp4 0.455,0.607,0.851,0.585,0.402,0.744,0.651
Quant Perplexity Peak Memory Tokens/sec
mxfp8 35.937 ยฑ 0.525 14.80 GB 1153
mxfp4 36.746 ยฑ 0.534 11.06 GB 1030Old numbers
quant arc arc/e boolq hswag obkqa piqa wino
mxfp8 0.404,0.489,0.825,0.586,0.392,0.734,0.661
mxfp4 0.414,0.508,0.854,0.562,0.378,0.717,0.645
Quant Perplexity Peak Memory Tokens/sec
mxfp8 34.652 ยฑ 0.502 14.80 GB 1146
mxfp4 35.203 ยฑ 0.506 11.06 GB 1200I will re-do all baselines soon based on the new template. It is completely expected that the model behavior will change as a result.
Here are the effects of the new template on few known distills from DavidAU
gemma-4-E4B-it-The-DECKARD-Expresso-Universe-HERETIC-UNCENSORED
quant arc arc/e boolq hswag obkqa piqa wino
New template
mxfp8 0.518,0.709,0.755,0.657,0.418,0.759,0.626
mxfp4 0.485,0.682,0.792,0.641,0.432,0.746,0.635
Old template
mxfp8 0.506,0.697,0.754,0.661,0.416,0.757,0.627
mxfp4 0.487,0.670,0.792,0.644,0.430,0.748,0.624gemma-4-E4B-it-GLM-4.7-Flash-HERETIC-UNCENSORED-Thinking
mxfp8 0.461,0.599,0.779,0.630,0.406,0.766,0.629
Old template
mxfp8 0.456,0.580,0.786,0.629,0.410,0.764,0.633gemma-4-E4B-it-Claude-Opus-4.5-HERETIC-UNCENSORED-Thinking
mxfp8 0.509,0.705,0.806,0.646,0.416,0.773,0.650
Old template
mxfp8 0.502,0.692,0.809,0.650,0.420,0.771,0.651RE: 16-18 B ; yes, something running in the lab right now. (Gemma 4).
Also can make Qwen 3's (Version 3) moes like Llama3.2-8X3B as well ; I have some of these at my repo too.
I have built a few GPT-OSS ; and some 12B [mistral nemo] as well as mistral nemo "large" 15-17Bs...
A lot of options ;
Maybe in the future ; atm still learning/addressing quirks with these new Gemmas.
Google released three different arch structure here : "E", "MOE", and 31B dense.
Also plans to create larger Gemma 4s too ; which may work better for specific applications and/or work better period.
These are in the plans for next week.
De-censored, tuned, and tuned again via Unsloth using custom in house datasets and methods:
DavidAU/gemma-4-E4B-it-The-DECKARD-Expresso-Universe-HERETIC-UNCENSORED-Thinking
Exceeds Gemma4 26B-A4B in critical benchmarks.
Training a Gemma 4 Reap 19B-A4B right now ; should be done tomorrow, then testing.
RE: FRanken merge 26B-A3B ; yes, just need to make a map for Mergekit ; this is also in progress.
RE: Claudes ; depends on how reap turns out.
There are a lot of updates still in progress with Unsloth/Llamacpp RE: Gemma 4s atm too ;
There are also some dataset issues to address when training with Gemma 4s.
NOTE:
Just finished a number of fine tunes on Gemma 4's E4B ; which is a MOE LIKE model. These will release in the next day or so ; pending final testing.
Uncensored first, then tuned.
Some benchmarks posted, others pending.
Examples posted, detailed instructions.
Some GGUFs are up; others pending as of this writing.
Enjoy:
DavidAU/gemma-4-31B-it-Mystery-Fine-Tune-HERETIC-UNCENSORED-Thinking
DavidAU/gemma-4-31B-it-Grand-Horror-X-INTENSE-HERETIC-UNCENSORED-Thinking
DavidAU/gemma-4-31B-it-The-DECKARD-HERETIC-UNCENSORED-Thinking
UPDATE:
DavidAU/gemma-4-E4B-it-The-DECKARD-Expresso-Universe-HERETIC-UNCENSORED-Thinking
Exceeds Gemma4 26B-A4B in critical benchmarks.
Qwen 3.5 40B Claude Opus Deckard UNCENSORED.
Expanded, and trained with Claude Opus 4.6 Dataset, but first it was Heretic'ed and trained with DECKARD - 5 hand crafted datasets to give the model character, point of view and intelligence... and a lot more.
Examples posted.
Several quant types available under quantizations:
DavidAU/Qwen3.5-40B-Claude-4.6-Opus-Deckard-Heretic-Uncensored-Thinking
Drastically larger, with performance to match.
Upgraded Jinja template too.
DavidAU/Qwen3.5-40B-Claude-4.5-Opus-High-Reasoning-Thinking
UPDATE:
All of these are now up; and can be downloaded.
Awaiting quants.
RE: 13B:
=> one is upscaled + trained, the other is merge of two 9Bs fine tunes (and upscaled).
They are hidden as of this writing (undergoing private testing), awaiting final metrics / eval.
If they "pass" ; they will be made public.
These will be active within 24-48 hrs pending results.
Currently have full running 13B (GLM 4.7 Flash) - which is very strong ; and experimental 21Bs of Qwen 3.5.
These are trained.
These are in testing, and access is limited as of this writing.
As for MOEs:
This is a little more complicated as scripting must be written for Mergekit to "moe together" 0.8B, 2B, 4B, 9Bs etc etc.
A draft (by me) has been completed to do this; but not tested/debugged yet.
No time line here ; too many variables.
RE 35B moes ; it is possible to address this in a different way ; but I have not tried it yet.
This is a different approach than REAP.