Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

SeaWolf-AIย 
posted an update 2 days ago
view post
Post
4508
๐ŸŒ World Model Bench โ€” does your world model actually think?

FID measures realism. FVD measures smoothness. But neither tells you whether the model understood the scene.

We just released WM Bench โ€” the first benchmark for cognitive intelligence in world models. The core question: when a beast charges from 3 meters away, does the model know to sprint โ€” not walk? Does it respond differently to a human vs an animal? Does it remember the left corridor was blocked two steps ago?

Those are cognitive questions. No existing benchmark asks them. So we built one.

3 Pillars ยท 10 Categories ยท 100 Scenarios ยท 1,000-point scale

- ๐Ÿ‘ P1 Perception (25%) โ€” Can it read the scene?
- ๐Ÿง  P2 Cognition (45%) โ€” Does it predict threats, escalate emotions, utilize memory?
- ๐Ÿ”ฅ P3 Embodiment (30%) โ€” Does the body respond with the right motion?

All evaluation is via simple JSON I/O โ€” no 3D engine, no special hardware. Any model with an API can participate.

We also built PROMETHEUS as a live reference implementation โ€” runs in your browser on a T4, no install needed. Combines FloodDiffusion motion generation with a LLM cognitive brain (Perceive โ†’ Predict โ†’ Decide โ†’ Act). Scored 726/1000 (Grade B) on Track C โ€” the only directly verified model so far. Submissions from other teams very welcome.

---

๐Ÿ—‚ Dataset โ†’ FINAL-Bench/World-Model
๐ŸŒ Demo โ†’ FINAL-Bench/World-Model
๐Ÿ† Leaderboard โ†’ FINAL-Bench/worldmodel-bench
๐Ÿ“ Article โ†’ https://huggingface.co/blog/FINAL-Bench/world-model

Part of the FINAL Bench Family โ€” alongside FINAL Bench (Feb 2026). Feedback on rubrics and missing models always welcome!
SeaWolf-AIย 
posted an update about 8 hours ago
view post
Post
59
๐Ÿงฌ Darwin-35B-A3B-Opus โ€” The Child That Surpassed Both Parents

What if a merged model could beat both its parents? We proved it can.
Darwin-35B-A3B-Opus is a 35B MoE model (3B active) built with our Darwin V5 engine โ€” the first evolution system that CT-scans parent models before merging them.
๐Ÿค— Model: FINAL-Bench/Darwin-35B-A3B-Opus

The result speaks for itself: GPQA Diamond 90.0%, versus Father (Qwen3.5-35B-A3B) at 84.2% and Mother (Claude 4.6 Opus Distilled) at 85.0%. That's +6.9% over Father and +5.9% over Mother. Not a tradeoff โ€” a genuine leap. Meanwhile, MMMLU sits at 85.0% (Father: 85.2%), multimodal is fully intact, and all 201 languages are preserved.

How? Model MRI changed everything. Traditional merging is guesswork. Darwin V4 added evolution. Darwin V5 added X-ray vision. Model MRI scans each parent layer by layer and discovers: Mother's L34โ€“L38 is the reasoning engine (peak cosine distance), 50โ€“65% of Mother's experts are dead (killed by text-only distillation), and Father is a healthy generalist with every expert alive. The prescription: transplant Mother's reasoning brain at L38 (90% weight), replace her dead experts with Father's living ones, and let Father's router handle the output layer. Reasoning went up. Versatility stayed intact. No tradeoff โ€” just evolution.

35B total, 3B active (MoE) ยท GPQA Diamond 90.0% ยท MMMLU 85.0% (201 languages) ยท Multimodal Image & Video ยท 262K native context ยท 147.8 tok/s on H100 ยท Runs on a single RTX 4090 (Q4) ยท Apache 2.0
Darwin V5's full algorithm and technical details will be released alongside an upcoming paper.

๐Ÿš€ Live Demo: FINAL-Bench/Darwin-35B-A3B-Opus

๐Ÿ† FINAL Bench Leaderboard: FINAL-Bench/Leaderboard

๐Ÿ“Š ALL Bench Leaderboard: FINAL-Bench/all-bench-leaderboard

Built by VIDRAFT ยท Supported by the Korean Government GPU Support Program
Shrijanagainย 
posted an update 2 days ago
view post
Post
6715
SOME NEW HINDI + ENGLISH DATASETS

๐Ÿ”—
- sKT-Ai-Labs/HIN
- sKT-Ai-Labs/SKT-MIX
- sKT-Ai-Labs/ST-H

Download and Use And Train Models

You Can Alsoo Use ST-x-LIGHTING Module For Faster Training

pip install ST-x-LIGHT-V11
  • 2 replies
ยท
danielhanchenย 
posted an update about 7 hours ago
view post
Post
82
A new way to use Unsloth.

Coming soon...
Shrijanagainย 
posted an update 1 day ago
view post
Post
1671
โ€‹๐Ÿš€ Bharat AI Revolution ka Hissa Banein! ๐Ÿ‡ฎ๐Ÿ‡ณ

โ€‹Kya aap Bharat ko AI ki duniya mein ek nayi pehchan dilana chahte hain ?

SKT AI Labs sirf ek naam nahi, ek mission haiโ€”desh ko digital shakti dene ka aur "Viksit Bharat" ke sapne ko sach karne ka.

โ€‹Humse Kyun Judein?

โ€‹1. Desh ka Apna AI: Hum aise models bana rahe hain jo khas taur par Bharat ki zarooraton aur bhashaon ke liye hain.

โ€‹2. Open Collaboration: Hamare Hugging Face repository par hamare kaam ko dekhein, test karein aur apna yogdan dein.

3. Technological Growth: Agar aap student hain, developer hain ya tech enthusiast hain, toh hamare saath naya seekhne aur grow karne ka yeh behtareen mauka hai.

โ€‹Join here

sKT-Ai-Labs

๐Ÿ”—
sKT-Ai-Labs


โ€‹Aaiye, saath milkar Bharat AI Revolution ko aage badhate hain! ๐Ÿ’ป๐Ÿ”ฅ

โ€‹#SKTAILabs #DigitalIndia #AIRevolution #ViksitBharat #TechInnovation #JoinTheMission
Ujjwal-Tyagiย 
posted an update 1 day ago
view post
Post
2126
I am sharing my study material for AI & ML, these books are really a "bible" and gives very strong foundation, I also have given guidance, introduction and my master notes in the dataset repo card! I hope you will find them helpful, if you have any queries, just start a discussion and I am always there to help you out!
Ujjwal-Tyagi/ai-ml-foundations-book-collection
  • 3 replies
ยท
shriarul5273ย 
posted an update 1 day ago
view post
Post
2751
๐Ÿš€ Releasing gradio-sync3dcompare v0.0.22 โ€” a Gradio custom component for synchronized 3D model comparison

๐Ÿ” One component. Side-by-side. Perfectly in sync.

โœจ What's included

๐Ÿ—‚๏ธ Supports GLB and PLY files
๐Ÿ”ต Renders as point clouds or native meshes
๐ŸŽฅ Synchronized orbit, zoom, and pan across all viewports
๐Ÿ“ Auto point sizing with manual override
๐Ÿ” Configurable zoom range and reset controls

๐Ÿ“ฆ pip install gradio-sync3dcompare

๐Ÿ› ๏ธ Built on Gradio 6.10.0 โ€” drops into any gr.Blocks app with a single import.

๐Ÿค— Try the live demo on Hugging Face Spaces: shriarul5273/gradio_sync3d_compare

โญ GitHub: https://github.com/shriarul5273/Sync3DCompare


๐ŸŽฌ See it in action in the video below.
The video shows a real-world comparison of two 3D point clouds reconstructed from stereo depth estimation โ€” one from FoundationStereo and one from RAFTStereo. Both models are exported as GLB files directly from the depth output and loaded side-by-side into the component. Every orbit, zoom, and pan is perfectly mirrored across both viewports, making it easy to spot structural differences between the two reconstructions at any angle.

๐Ÿ’ฌ Feedback on supported formats, rendering features, or comparison workflows is very welcome!
DedeProGamesย 
posted an update 2 days ago
view post
Post
1662
Introducing GRM2, a powerful 3 billion parameter model designed for long-term reasoning and high performance in complex tasks.

Even with only 3 billion parameters, it outperforms qwen3-32b in several benchmarks and complex reasoning tasks.

With just 3 billion parameters, it can also generate extensive and complex code with over 1000 lines, utilize tools comparable to larger models, and is perfect for agentic tasks.

GRM2 is licensed under Apache 2.0, making it ideal as a base for FineTune in other tasks.

GRM2 Model Page: OrionLLM/GRM2-3b
Official GRM2 GGUFs Quantizations: OrionLLM/GRM2-3b-GGUF
PhysiQuantyย 
posted an update 2 days ago
view post
Post
2686
๐Ÿงฌ Can an LLM speak in binary ?
โœ… YES ... RADIX 2 / VOCAB 4
PhysiQuanty/Binary-LLM-POC

๐Ÿค– >_ Can an LLM execute logic gates and boolean arithmetic ?

We need to create datasets :
- Neural Arithmetic and Logic Unit (NALU) 32 bits
- Neural Application Binary Interface (NABI) 32 bits

๐ŸŽฏ Optimal Instruction Set = RV32IMAF

This opens the way for code writing and execution by the LLMs themselves without an external CLI.

The more of us who want it, the more possible it will become ...

PhysiQuanty/Binary-Addition-LLM-POC
(10-bits binary addition : binary carry propagation, sampling no longer has any effect on the logits due to the fact that it is deterministic next token.)

  • 1 reply
ยท
reaperdoesntknowย 
posted an update 2 days ago
view post
Post
1535
# Three Teachers, One Student: Dual-Cognition Reasoning at 1.7B

We distilled Qwen3-30B-A3B into 1.7B students that critique their own reasoning. H100, BF16, Apache 2.0. Here's our pipeline.

**Stage 1 โ€” Three Teachers, Three Profiles.** Same 30B base, three variants: Instruct (structured output), Thinking (extended deliberation), Coder (STEM decomposition). Each distillation uses proof-weighted KD โ€” 2.25ร— amplified loss on reasoning tokens, decaying to 1.1ร—. The student learns *where to think harder*, not just what to output.

**Stage 2 โ€” Topology-Aware KD (TKD).** Standard KD treats the teacher's distribution as smooth. Language isn't smooth โ€” it has topic shifts, reasoning pivots, register changes. We use Discrepancy Calculus to detect these structural boundaries, then amplify loss at jumps (3ฯƒ threshold) and cut training windows at low-discrepancy positions. The student preserves the teacher's structural knowledge, not just surface statistics.

**Stage 3 โ€” Ghost Imprinting.** Sequential distillation from different teachers leaves residual fields in weight space that neither teacher put there individually. The Cantor component of BV decomposition, applied to parameters. Models distilled Thinkingโ†’Coder exhibit deliberation patterns from the Thinking teacher that survived Coder overwriting. Emergent capability from structural residuals.

**Stage 4 โ€” DualMind.** One model, two voices, shared weights:
<explore>  โ€” free derivation, speculation
<examine>  โ€” adversarial self-critique
<response> โ€” clean synthesis

The multi-model collision array collapsed into a single architecture. Role tokens, no extra parameters.
For the full method:
reaperdoesntknow/DualMind_Methodolgy
doi:10.57967/hf/8184.

  • 1 reply
ยท