In a Training Loop 🔄

David Belton PRO

DavidAU

AI & ML interests

Application(s) of single/multiple LLMs in specialized use cases & automation tasks. LLM, Prompt , System Role and Parameter engineering VIA chat / API. 2700+ merge models, over 600 fine tunes, over 1000 models published. DISCORD: David_AU [drawless111]

Recent Activity

liked a model about 12 hours ago

JetBrains/Mellum2-12B-A2.5B-Thinking

new activity about 16 hours ago

DavidAU/Qwen3.6-27B-Heretic-Uncensored-FINETUNE-NEO-CODE-Di-IMatrix-MAX-GGUF:where to find"non IMatrix" of this model ?

new activity about 16 hours ago

DavidAU/Qwen3.6-40B-Claude-4.6-Opus-Deckard-Heretic-Uncensored-Thinking-NEO-CODE-Di-IMatrix-MAX-GGUF:unknown model architecture: 'qwen35'

View all activity

Organizations

replied to their post 1 day ago

This model does not contain MTP layers ; you need to run at non-MTP.

replied to their post 12 days ago

As of this writing:

There are pipeline (issues as well as optimizations) issues still currently, and it is not widely supported in some AI Apps.

Specifically:
Ggufs:

Imatrix is not yet supported for MTP.
Not all AI apps have updated to support it -> result -> MTP ggufs do not work at all.
Misc issues with speed still being worked on.

replied to their post 21 days ago

Training is compounded by number of experts in the model, which adds a serious level of time to the training.
Even 1000 samples [small!] takes 6-12 hrs.
Consider 31B dense , same samples, 30-60 minutes.

replied to their post about 1 month ago

I will add to the list; may wait for specific Heretic and/or tuned version.

I already have a 43B-A3B version running in the lab ; however tuning these sparse moe models take a lot more work/time and ahh... detail. AND a lot more VRAM!!! [can't compress these atm, so BF16 required => 100 GB+ ]

posted an update about 1 month ago

Post

15471

Uncensored, Heretic, Qwen 3.6 27B GGUFs - Exceeds all quant metrics and core model metrics too.

Tuned 27B Heretic Uncensored quants from IQ2M to Q8.
IQ2M is 83% of BF16, with Q6 just under 98% of BF16 precision.
Q8: 98.47% of BF16 precision.
NEO/Code DI-Imatrix Quants.

Exceeds all 5 metrics for "censored" quants too.

All metrics posted.

Tuned model -from which the quants were built- also exceeds Qwen 3.6 27B core metrics too.

DavidAU/Qwen3.6-27B-Heretic-Uncensored-FINETUNE-NEO-CODE-Di-IMatrix-MAX-GGUF

6 replies

replied to their post about 1 month ago

I may make a Q6 high and/or a Q8 Hybrid and/or Q8 "HI".
Imatrix does not have any affect on Q8 or BF16 ; unless the other tensors in the model are set at Q6 or lower.

A Q8 "HI" is a special case; where one or more tensors/layers are set at BF16.

posted an update about 1 month ago

Post

14610

Qwen3.6 27B - NEO-Code Imatrix Max GGUF Quants [exceeds Unsloth in key metrics]:

All quants benchmarked with 5 key metrics.
A DAVIDAU vs UNSLOTH Metrics showdown.
Quant quality exceeds Unsloth in key metrics.
IQ2_M to Q6 available.
Standout: IQ4XS at 94% of BF16 precision.
Full explainer for Quant metrics.

DavidAU/Qwen3.6-27B-NEO-CODE-Di-IMatrix-MAX-GGUF

6 replies

replied to their post about 1 month ago

Currently working with Qwen 3.5/6 35B-A3B in the lab ; learning the "quirks" ; still a ways to go.

reacted to nightmedia's post with 👍 about 2 months ago

Post

3605

Updated gemma-4-E4B-it metrics

I noticed the chat template got updated, and tried it on the E4B, with surprising results in stabilizing the brainwave.

quant    arc   arc/e boolq hswag obkqa piqa  wino
mxfp8    0.480,0.656,0.797,0.608,0.400,0.755,0.665
mxfp4    0.455,0.607,0.851,0.585,0.402,0.744,0.651

Quant    Perplexity      Peak Memory   Tokens/sec
mxfp8    35.937 ± 0.525  14.80 GB      1153
mxfp4    36.746 ± 0.534  11.06 GB      1030

Old numbers

quant    arc   arc/e boolq hswag obkqa piqa  wino
mxfp8    0.404,0.489,0.825,0.586,0.392,0.734,0.661
mxfp4    0.414,0.508,0.854,0.562,0.378,0.717,0.645

Quant    Perplexity      Peak Memory   Tokens/sec
mxfp8    34.652 ± 0.502  14.80 GB      1146
mxfp4    35.203 ± 0.506  11.06 GB      1200

I will re-do all baselines soon based on the new template. It is completely expected that the model behavior will change as a result.

Here are the effects of the new template on few known distills from DavidAU

gemma-4-E4B-it-The-DECKARD-Expresso-Universe-HERETIC-UNCENSORED

quant    arc   arc/e boolq hswag obkqa piqa  wino
New template
mxfp8    0.518,0.709,0.755,0.657,0.418,0.759,0.626
mxfp4    0.485,0.682,0.792,0.641,0.432,0.746,0.635
Old template
mxfp8    0.506,0.697,0.754,0.661,0.416,0.757,0.627
mxfp4    0.487,0.670,0.792,0.644,0.430,0.748,0.624

gemma-4-E4B-it-GLM-4.7-Flash-HERETIC-UNCENSORED-Thinking

mxfp8   0.461,0.599,0.779,0.630,0.406,0.766,0.629
Old template
mxfp8   0.456,0.580,0.786,0.629,0.410,0.764,0.633

gemma-4-E4B-it-Claude-Opus-4.5-HERETIC-UNCENSORED-Thinking

mxfp8    0.509,0.705,0.806,0.646,0.416,0.773,0.650
Old template
mxfp8    0.502,0.692,0.809,0.650,0.420,0.771,0.651

2 replies

replied to their post about 2 months ago

RE: 16-18 B ; yes, something running in the lab right now. (Gemma 4).

Also can make Qwen 3's (Version 3) moes like Llama3.2-8X3B as well ; I have some of these at my repo too.
I have built a few GPT-OSS ; and some 12B [mistral nemo] as well as mistral nemo "large" 15-17Bs...

A lot of options ;

replied to their post about 2 months ago

Maybe in the future ; atm still learning/addressing quirks with these new Gemmas.
Google released three different arch structure here : "E", "MOE", and 31B dense.

Also plans to create larger Gemma 4s too ; which may work better for specific applications and/or work better period.
These are in the plans for next week.

posted an update about 2 months ago

Post

8696

Going NUCLEAR: Gemma 4 E4B uncensored, and tuned exceeds Gemma 4 26B-A4B in critical benchmarks.

De-censored, tuned, and tuned again via Unsloth using custom in house datasets and methods:

DavidAU/gemma-4-E4B-it-The-DECKARD-Expresso-Universe-HERETIC-UNCENSORED-Thinking

8 replies

replied to their post about 2 months ago

UPDATE:
https://huggingface.co/DavidAU/gemma-4-E4B-it-The-DECKARD-Expresso-Universe-HERETIC-UNCENSORED-Thinking

Exceeds Gemma4 26B-A4B in critical benchmarks.

replied to their post about 2 months ago

Training a Gemma 4 Reap 19B-A4B right now ; should be done tomorrow, then testing.

RE: FRanken merge 26B-A3B ; yes, just need to make a map for Mergekit ; this is also in progress.

RE: Claudes ; depends on how reap turns out.
There are a lot of updates still in progress with Unsloth/Llamacpp RE: Gemma 4s atm too ;
There are also some dataset issues to address when training with Gemma 4s.

NOTE:
Just finished a number of fine tunes on Gemma 4's E4B ; which is a MOE LIKE model. These will release in the next day or so ; pending final testing.

posted an update about 2 months ago

Post

7678

THREE Gemma 4 , 31B Uncensored Fine Tunes (via Unsloth, inhouse datasets):

Uncensored first, then tuned.
Some benchmarks posted, others pending.
Examples posted, detailed instructions.
Some GGUFs are up; others pending as of this writing.

Enjoy:

DavidAU/gemma-4-31B-it-Mystery-Fine-Tune-HERETIC-UNCENSORED-Thinking
DavidAU/gemma-4-31B-it-Grand-Horror-X-INTENSE-HERETIC-UNCENSORED-Thinking
DavidAU/gemma-4-31B-it-The-DECKARD-HERETIC-UNCENSORED-Thinking

UPDATE:
DavidAU/gemma-4-E4B-it-The-DECKARD-Expresso-Universe-HERETIC-UNCENSORED-Thinking

Exceeds Gemma4 26B-A4B in critical benchmarks.

6 replies

posted an update 2 months ago

Post

5396

Power, Freedom and Character:
Qwen 3.5 40B Claude Opus Deckard UNCENSORED.

Expanded, and trained with Claude Opus 4.6 Dataset, but first it was Heretic'ed and trained with DECKARD - 5 hand crafted datasets to give the model character, point of view and intelligence... and a lot more.

Examples posted.

Several quant types available under quantizations:

DavidAU/Qwen3.5-40B-Claude-4.6-Opus-Deckard-Heretic-Uncensored-Thinking

posted an update 3 months ago

Post

9187

Going for the GOLD: Qwen 3.5 40B Claude 4.5 Opus

Drastically larger, with performance to match.
Upgraded Jinja template too.

DavidAU/Qwen3.5-40B-Claude-4.5-Opus-High-Reasoning-Thinking

replied to their post 3 months ago

UPDATE:
All of these are now up; and can be downloaded.
Awaiting quants.

replied to their post 3 months ago

RE: 13B:
=> one is upscaled + trained, the other is merge of two 9Bs fine tunes (and upscaled).

They are hidden as of this writing (undergoing private testing), awaiting final metrics / eval.
If they "pass" ; they will be made public.

These will be active within 24-48 hrs pending results.

replied to their post 3 months ago

Currently have full running 13B (GLM 4.7 Flash) - which is very strong ; and experimental 21Bs of Qwen 3.5.
These are trained.

These are in testing, and access is limited as of this writing.

As for MOEs:
This is a little more complicated as scripting must be written for Mergekit to "moe together" 0.8B, 2B, 4B, 9Bs etc etc.
A draft (by me) has been completed to do this; but not tested/debugged yet.

No time line here ; too many variables.

RE 35B moes ; it is possible to address this in a different way ; but I have not tried it yet.
This is a different approach than REAP.

David Belton PRO

AI & ML interests

Recent Activity

Organizations

DavidAU's activity