20 2 34

Yatharth Sharma

YatharthS

AI & ML interests

TTS, speech generation, Agents, MCP

Recent Activity

liked a model 6 days ago

XiaomiMiMo/MiMo-V2.5-Pro

new activity about 1 month ago

YatharthS/LuxTTS:When will you restart this space? I want to make a test. Thanks

new activity about 2 months ago

YatharthS/LavaSR:Can not find a use case.

View all activity

Organizations

reacted to branikita's post with 🚀 2 months ago

Post

1535

Our engineer Alan Subin from Robonine has started preparations for testing the manipulator on the mobile two-wheeled platform.

2 replies

reacted to albertvillanova's post with 🤗 2 months ago

Post

2538

🚀 TRL v0.29.0 introduces trl-training: an agent-native training skill.

This makes the TRL CLI a structured, agent-readable capability, allowing AI agents to reliably execute training workflows such as:
- Supervised Fine-Tuning (SFT)
- Direct Preference Optimization (DPO)
- Group Relative Policy Optimization (GRPO)

We’re excited to see what the community builds on top of this.

If you’re working on AI agents, alignment research, or scalable RL training infrastructure: give TRL v0.29.0 a try! 🤗

The future of ML tooling is agent-native.
🔗 https://github.com/huggingface/trl/releases/tag/v0.29.0

reacted to OzTianlu's post with 🤗 2 months ago

Post

1738

Scaling UP in Kai! 🌊
NoesisLab/Kai-3B-Instruct

Introducing NoesisLab/Kai-3B-Instruct What happens when you force a 3B model to reason entirely in its latent space ?
Meet Kai-3B, our latest industrial-grade reasoning model fine-tuned using the Adaptive Dual Search (ADS) algorithm.
GSM8K (0-shot, Direct Answer): 39.27% 🤯 (Llama-2-7B is ~14.6%)
HumanEval (Pass@1): 39.02% 💻 (Overtakes Gemma-2-2B's 30%)
MMLU (5-shot): 53.62% 📚 (Crushing the 50% barrier)
ARC-Challenge: 51.88%🎯
PIQA: 77.53%
HellaSwag: 69.53%
Kai-3B proves that reasoning density doesn't strictly require parameter bloat or verbose generation. It acts as a perfect, cold-blooded Agent action-engine—ideal for JSON routing, SWE-bench patch generation, and anywhere you need absolute structured certainty without token waste.

2 replies

reacted to AbstractPhil's post with 👀 2 months ago

Post

1599

GLIP - Geometric Linear Interpolative Patchwork aka geolip.
https://github.com/AbstractEyes/glip-autoencoder

To tinker with the topology directly you can play with it here, though I admit it's imperfect in this form - it's quite the tinker toy to see the effects of patching.
https://claude.ai/public/artifacts/697287e4-fa18-4753-8b57-904d5e2022ed

This is the repo that will contain the next experimental stage, which is based entirely on the research and structural boundaries applied by said research. It'll be a little rigid while I get Claude set up.

In order to directly train these layered topological response patchworks you must install and use the geovocab2, geofractal, and wide_compiler repos.

This is due to the wide_compiler's wide_linear high-speed efficiency for ensemble processing, the geovocab2 factory structure with multiple formulas including highly efficient designs meant for kernel compilation, and a series of reusable utilities in geofractal including some of the more complex losses and difficult to optimally tune gate structures surrounding them.

Many of the underlying formulas are outlined here;
AbstractPhil/geometric-experiment-history

Utilization and training USING the pretrained or untrained geolip patchwork will be as simple as loading the model in pytorch and will not require external dependencies of the geolip package, numpy, or pytorch depending on the task. It will come packaged with recommended losses but I encourage experimentation because I simply cannot cover all spectrums.

More details to come as development progresses. The system is coming together and the state of the utilizable autoencoder will be ready within a couple weeks. The entire system is built for convenience and reusability, so the structure will be built similarly to autoencoder systems that currently exist, with a few tweaks here and there for important elements - so the interface will be familiar to those who use it.

6 replies

reacted to sergiopaniego's post with 🚀 2 months ago

Post

2513

What happens when you make an LLM drive a car where physics are real and actions can't be undone?

I ported CARLA, the autonomous driving simulator, to OpenEnv and added training support via TRL + Hugging Face Spaces.

The model interacts with the simulator through tool calls (observe, brake, change lane) and learns from a reward signal.

In 50 training steps, Qwen 0.6B learns to swerve and brake to avoid pedestrians in emergency situations.

The project supports text and vision (VLMs can see through a camera sensor), open-world driving with traffic, and multiple driving scenarios.

This builds on the carla-env project by sinatras, which originally placed LLMs inside CARLA for evaluation. We extended it with vision, new scenarios, rubric-based rewards, and made it trainable end-to-end.

Blog: https://huggingface.co/blog/sergiopaniego/bringing-carla-to-openenv-trl/
CARLA env in OpenEnv: https://github.com/meta-pytorch/OpenEnv/tree/main/envs/carla_env
Training script: https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/carla.py

reacted to nyuuzyou's post with 🔥 2 months ago

Post

2155

🌍 Street-Level Imagery Dataset nyuuzyou/streetview

934,191 image records index Eastern Europe and Northern Asia. Temporal links map historical views at identical coordinates across nine years.

Key Stats:

- 905,940 unique images
- Coverage spanning 2016 to 2025
- Average 14.3 historical links per location

Geographic bounds span 20.49° E to 152.32° E. Urban centers show higher data density.

5 replies

reacted to scthornton's post with 🚀 2 months ago

Post

1933

# SecureCode Dataset Family Update: 2,185 Security Examples, Framework-Specific Patterns, Clean Parquet Loading

Hey y'all,

Quick update on the SecureCode dataset family. We've restructured things and fixed several issues:

**What changed:**

- The datasets are now properly split into three repos: [unified]( scthornton/securecode) (2,185), [web]( scthornton/securecode-web) (1,378), [AI/ML]( scthornton/securecode-aiml) (750)
- All repos now use Parquet format -- load_dataset() just works, no deprecated loading scripts
- SecureCode Web now includes 219 framework-specific examples (Express, Django, Spring Boot, Flask, Rails, Laravel, ASP.NET Core, FastAPI, NestJS)
- Data cards have been corrected and split sizes fixed

**Why it matters:**

With AI-generated code accounting for 60%+ of some codebases (Checkmarx 2025), security training data is more important than ever. Every example in SecureCode is grounded in a real CVE with 4-turn conversations that mirror actual developer-AI workflows.

If you're working on code generation models, I'd love to hear how you're approaching the security angle. Are there vulnerability categories or frameworks you'd like to see covered?

Paper: [arxiv.org/abs/2512.18542](https://arxiv.org/abs/2512.18542)

reacted to their post with 🚀🔥 2 months ago

Post

2901

Just open sourced LavaSR v2: a model that can enhance 5000 seconds of audio in 1 second while being higher quality than giant and slow 6gb diffusion models!

It works with any sampling rate from 8-48khz and is nearly 5000x faster than competition while being superior in objective benchmarks.

LavaSR v2 is Perfect for
- Enhancing TTS models.
- Fixing old audio datasets.
- Restoring low quality recordings.

You can check out the examples and run it locally or online:

Repo: https://github.com/ysharma3501/LavaSR.git
Demo: YatharthS/LavaSR
Model: YatharthS/LavaSR

posted an update 2 months ago

Post

2901

reacted to AbstractPhil's post with 🔥 3 months ago

Post

989

Meet FluxLailah; AbstractPhil/tiny-flux-deep; 220m Flux variant currently pretraining at BF16. She is experimental, does not produce solid images yet - and yet she is producing. There is both an EMA and a raw weights pair producing different images. The EMA is particularly interesting at times.
Lailah uses flan-t5-base, clip-vit-l-14, and BlackForestLabs Flux1s VAE.
SEQ limit 128, images 512x512 for now. Lailah's early form is based on three variants. TinyFlux's weights were carefully planted into a deeper structure and trained yet again - dubbed TinyFlux-Deep. This variant has 15 dual-stream blocks and 25 single-stream blocks, nearly identical weight code as Flux with a similar attention mechanism - but intentionally deviant and compacted with careful consideration to scaling and purpose of mechanisms.
She went through quite a few growing pains with her earlier attention mechanism which required a reimagining today and careful consideration of the consequences, and now I present to you the preliminary look into Lailah.
The preliminary training is still heavily under way, the mechanisms are still being augmented, and her stability is currently being measured. The potential for fidelity, depth, and quality are still in measure - so I will be shifting attention and pivoting utility based on the needs over time.

2 replies

reacted to raincandy-u's post with 🔥 3 months ago

Post

5602

🤗 Just released Rain-100M, an experimental ~97M-parameter Qwen3-style language model trained from random initialization.

Repo: raincandy-u/Rain-100M

Data: HuggingFaceFW/fineweb-edu, ~3B tokens, English only

Tokenizer: custom 16k BPE, context length 4096

Architecture: 12 Transformer layers, hidden size 768, 12 heads, MLP 2048, SiLU, bf16

Rain-100M is a raw base model (not instruction-tuned or safety-aligned), aimed at small-scale research, debugging training pipelines, and CPU/edge experiments. If you run evaluations, finetunes, or visualizations with it, I would be very interested in your results!

3 replies

replied to their post 4 months ago

Hey, I am working on a new TTS model called LuxVoice which will include this instead and I’ll try my best to convert this to onnx as well.

replied to their post 4 months ago

Yeah seems very cool, great work!

reacted to Ujjwal-Tyagi's post with 🤗 4 months ago

Post

2615

I am very excited to see the release of nyuuzyou/gitee-code. This is exactly what I have been looking for. Thank you to @nyuuzyou for his hard work on this.

3 replies

reacted to dhruv3006's post with 👀 4 months ago

Post

2712

Voiden gives you two ways to work with GraphQL - so you can focus on writing and testing queries with confidence.

1. Importing a GraphQL Schema File

You can import a GraphQL schema file such as .graphql or .gql directly into Voiden.

When you do this:

- Voiden reads all types, queries, mutations, and subscriptions from the schema
- The schema becomes available locally and works well in offline scenarios
- You get a stable, version-controlled setup that aligns nicely with Git workflows

This approach is ideal when you already have the schema file and want full control over it.

2. Using GraphQL Introspection

Alternatively, you can provide a GraphQL endpoint URL to Voiden.

In this case :

- Voiden make an introspection query to the GraphQL server
- The server returns all available types, queries, mutations, and subscriptions
- Voiden automatically loads this information so you can start querying immediately

This option is perfect for quickly exploring a live GraphQL API or when the schema file is not available locally.

Use GraphQL in our beta version : https://voiden.md/beta

reacted to sequelbox's post with 👀 4 months ago

Post

2692

NEW RELEASE: it's here! Meet the newest member of the Valiant crew: Guardpoint, our new medical reasoning model!
- Trained on medical knowledge, management, diagnosis, and tasks from DeepSeek-V3.2-Speciale!
- Structured medical reasoning responses are efficient and informative, cutting token costs for faster inference!
- Wide-ranging knowledge base: trained on a wide variety of medical disciplines, patient types, and query structures!
- High quality medical responses emphasize performance, brevity, specificity, statistical rationality, and openness.

Get it now:
Guardpoint for Qwen 3 32B: ValiantLabs/Qwen3-32B-Guardpoint
Guardpoint for Qwen 3 14B: ValiantLabs/Qwen3-14B-Guardpoint
Powered by our new structured medical reasoning dataset: sequelbox/Superpotion-DeepSeek-V3.2-Speciale

We've been working hard on Guardpoint; we're really excited to share it with everyone!

We'll be bringing Guardpoint to more models soon, along with further releases for the Shining Valiant and Esper series!

Get our experimental models: https://huggingface.co/collections/sequelbox/experimental-reasoning-models
Get our reasoning datasets: https://huggingface.co/collections/sequelbox/reasoning-datasets

Help support our releases, donations used for our experimental models and datasets: sequelbox/SupportOpenSource

2026 is going to be an amazing year for open source AI! It's time for the AI revolution you need; from the bottom up, built together by all of us.

for love, friendship, and better days,
allegra

1 reply

reacted to MikeDoes's post with 👀 4 months ago

Post

246

The future of AI privacy isn't just in the cloud; it's on your device. But how do we build and validate these tools?

A new paper on "Rescriber" explores this with a tool that uses smaller LLMs for on-device anonymization. Building and validating such tools requires a strong data foundation. We're excited to see that the researchers used the Ai4Privacy open dataset to create their performance benchmarks.

This is our mission in action: providing the open-source data that helps innovators build and test better solutions that will give users more control over their privacy. It's a win for the community when our data helps prove the feasibility of on-device AI for data minimization, with reported user perceptions on par with state-of-the-art cloud models.

Shoutout to Jijie Zhou, Eryue Xu, Yaoyao Wu, and Tianshi Li on this one!

🔗 Check out the research to see how on-device AI, powered by solid data, is changing the game: https://dl.acm.org/doi/pdf/10.1145/3706598.3713701

🚀 Stay updated on the latest in privacy-preserving AI—follow us on LinkedIn: https://www.linkedin.com/company/ai4privacy/posts/

#OpenSource
#DataPrivacy
#LLM
#Anonymization
#AIsecurity
#HuggingFace
#Ai4Privacy
#Worldslargestopensourceprivacymaskingdataset

reacted to AdinaY's post with 🚀 4 months ago

Post

364

GLM-Image from Z.ai is out 🔥

It was fully trained on Ascend Atlas 800T A2 with MindSpore, probably the first SOTA multimodal model fully trained on domestic chips 👀

zai-org/GLM-Image

✨ Hybrid Architecture: combined autoregressive + diffusion design delivers strong semantic alignment with high-fidelity details
✨ Strong performance in long, dense, and multilingual text rendering
✨ MIT licensed (VQ tokenizer & ViT weights under Apache 2.0)
✨ Now live on Hugging Face inference provider 🤗

reacted to Yehor's post with 🔥 4 months ago

Post

453

A useful tool for all who works with audio datasets: https://github.com/RustedBytes/data-viewer-audio

Yatharth Sharma

AI & ML interests

Recent Activity

Organizations

YatharthS's activity