Yatharth Sharma
YatharthS
Β·
AI & ML interests
TTS, speech generation, Agents, MCP
Recent Activity
updated
a model 2 days ago
YatharthS/LavaSR new activity
3 days ago
YatharthS/LavaSR:Feedback Organizations
reacted to branikita's post with π 3 days ago
reacted to albertvillanova's post with π€ 3 days ago
Post
1610
π TRL v0.29.0 introduces trl-training: an agent-native training skill.
This makes the TRL CLI a structured, agent-readable capability, allowing AI agents to reliably execute training workflows such as:
- Supervised Fine-Tuning (SFT)
- Direct Preference Optimization (DPO)
- Group Relative Policy Optimization (GRPO)
Weβre excited to see what the community builds on top of this.
If youβre working on AI agents, alignment research, or scalable RL training infrastructure: give TRL v0.29.0 a try! π€
The future of ML tooling is agent-native.
π https://github.com/huggingface/trl/releases/tag/v0.29.0
This makes the TRL CLI a structured, agent-readable capability, allowing AI agents to reliably execute training workflows such as:
- Supervised Fine-Tuning (SFT)
- Direct Preference Optimization (DPO)
- Group Relative Policy Optimization (GRPO)
Weβre excited to see what the community builds on top of this.
If youβre working on AI agents, alignment research, or scalable RL training infrastructure: give TRL v0.29.0 a try! π€
The future of ML tooling is agent-native.
π https://github.com/huggingface/trl/releases/tag/v0.29.0
reacted to OzTianlu's post with π€ 3 days ago
Post
1678
Scaling UP in Kai! π
NoesisLab/Kai-3B-Instruct
Introducing NoesisLab/Kai-3B-Instruct What happens when you force a 3B model to reason entirely in its latent space ?
Meet Kai-3B, our latest industrial-grade reasoning model fine-tuned using the Adaptive Dual Search (ADS) algorithm.
GSM8K (0-shot, Direct Answer): 39.27% π€― (Llama-2-7B is ~14.6%)
HumanEval (Pass@1): 39.02% π» (Overtakes Gemma-2-2B's 30%)
MMLU (5-shot): 53.62% π (Crushing the 50% barrier)
ARC-Challenge: 51.88%π―
PIQA: 77.53%
HellaSwag: 69.53%
Kai-3B proves that reasoning density doesn't strictly require parameter bloat or verbose generation. It acts as a perfect, cold-blooded Agent action-engineβideal for JSON routing, SWE-bench patch generation, and anywhere you need absolute structured certainty without token waste.
NoesisLab/Kai-3B-Instruct
Introducing NoesisLab/Kai-3B-Instruct What happens when you force a 3B model to reason entirely in its latent space ?
Meet Kai-3B, our latest industrial-grade reasoning model fine-tuned using the Adaptive Dual Search (ADS) algorithm.
GSM8K (0-shot, Direct Answer): 39.27% π€― (Llama-2-7B is ~14.6%)
HumanEval (Pass@1): 39.02% π» (Overtakes Gemma-2-2B's 30%)
MMLU (5-shot): 53.62% π (Crushing the 50% barrier)
ARC-Challenge: 51.88%π―
PIQA: 77.53%
HellaSwag: 69.53%
Kai-3B proves that reasoning density doesn't strictly require parameter bloat or verbose generation. It acts as a perfect, cold-blooded Agent action-engineβideal for JSON routing, SWE-bench patch generation, and anywhere you need absolute structured certainty without token waste.
reacted to AbstractPhil's post with π 3 days ago
Post
1533
GLIP - Geometric Linear Interpolative Patchwork aka geolip.
https://github.com/AbstractEyes/glip-autoencoder
To tinker with the topology directly you can play with it here, though I admit it's imperfect in this form - it's quite the tinker toy to see the effects of patching.
https://claude.ai/public/artifacts/697287e4-fa18-4753-8b57-904d5e2022ed
This is the repo that will contain the next experimental stage, which is based entirely on the research and structural boundaries applied by said research. It'll be a little rigid while I get Claude set up.
In order to directly train these layered topological response patchworks you must install and use the geovocab2, geofractal, and wide_compiler repos.
This is due to the wide_compiler's wide_linear high-speed efficiency for ensemble processing, the geovocab2 factory structure with multiple formulas including highly efficient designs meant for kernel compilation, and a series of reusable utilities in geofractal including some of the more complex losses and difficult to optimally tune gate structures surrounding them.
Many of the underlying formulas are outlined here;
AbstractPhil/geometric-experiment-history
Utilization and training USING the pretrained or untrained geolip patchwork will be as simple as loading the model in pytorch and will not require external dependencies of the geolip package, numpy, or pytorch depending on the task. It will come packaged with recommended losses but I encourage experimentation because I simply cannot cover all spectrums.
More details to come as development progresses. The system is coming together and the state of the utilizable autoencoder will be ready within a couple weeks. The entire system is built for convenience and reusability, so the structure will be built similarly to autoencoder systems that currently exist, with a few tweaks here and there for important elements - so the interface will be familiar to those who use it.
https://github.com/AbstractEyes/glip-autoencoder
To tinker with the topology directly you can play with it here, though I admit it's imperfect in this form - it's quite the tinker toy to see the effects of patching.
https://claude.ai/public/artifacts/697287e4-fa18-4753-8b57-904d5e2022ed
This is the repo that will contain the next experimental stage, which is based entirely on the research and structural boundaries applied by said research. It'll be a little rigid while I get Claude set up.
In order to directly train these layered topological response patchworks you must install and use the geovocab2, geofractal, and wide_compiler repos.
This is due to the wide_compiler's wide_linear high-speed efficiency for ensemble processing, the geovocab2 factory structure with multiple formulas including highly efficient designs meant for kernel compilation, and a series of reusable utilities in geofractal including some of the more complex losses and difficult to optimally tune gate structures surrounding them.
Many of the underlying formulas are outlined here;
AbstractPhil/geometric-experiment-history
Utilization and training USING the pretrained or untrained geolip patchwork will be as simple as loading the model in pytorch and will not require external dependencies of the geolip package, numpy, or pytorch depending on the task. It will come packaged with recommended losses but I encourage experimentation because I simply cannot cover all spectrums.
More details to come as development progresses. The system is coming together and the state of the utilizable autoencoder will be ready within a couple weeks. The entire system is built for convenience and reusability, so the structure will be built similarly to autoencoder systems that currently exist, with a few tweaks here and there for important elements - so the interface will be familiar to those who use it.
reacted to sergiopaniego's post with π 3 days ago
Post
2153
What happens when you make an LLM drive a car where physics are real and actions can't be undone?
I ported CARLA, the autonomous driving simulator, to OpenEnv and added training support via TRL + Hugging Face Spaces.
The model interacts with the simulator through tool calls (observe, brake, change lane) and learns from a reward signal.
In 50 training steps, Qwen 0.6B learns to swerve and brake to avoid pedestrians in emergency situations.
The project supports text and vision (VLMs can see through a camera sensor), open-world driving with traffic, and multiple driving scenarios.
This builds on the carla-env project by sinatras, which originally placed LLMs inside CARLA for evaluation. We extended it with vision, new scenarios, rubric-based rewards, and made it trainable end-to-end.
Blog: https://huggingface.co/blog/sergiopaniego/bringing-carla-to-openenv-trl/
CARLA env in OpenEnv: https://github.com/meta-pytorch/OpenEnv/tree/main/envs/carla_env
Training script: https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/carla.py
I ported CARLA, the autonomous driving simulator, to OpenEnv and added training support via TRL + Hugging Face Spaces.
The model interacts with the simulator through tool calls (observe, brake, change lane) and learns from a reward signal.
In 50 training steps, Qwen 0.6B learns to swerve and brake to avoid pedestrians in emergency situations.
The project supports text and vision (VLMs can see through a camera sensor), open-world driving with traffic, and multiple driving scenarios.
This builds on the carla-env project by sinatras, which originally placed LLMs inside CARLA for evaluation. We extended it with vision, new scenarios, rubric-based rewards, and made it trainable end-to-end.
Blog: https://huggingface.co/blog/sergiopaniego/bringing-carla-to-openenv-trl/
CARLA env in OpenEnv: https://github.com/meta-pytorch/OpenEnv/tree/main/envs/carla_env
Training script: https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/carla.py
reacted to nyuuzyou's post with π₯ 3 days ago
Post
1781
π Street-Level Imagery Dataset nyuuzyou/streetview
934,191 image records index Eastern Europe and Northern Asia. Temporal links map historical views at identical coordinates across nine years.
Key Stats:
- 905,940 unique images
- Coverage spanning 2016 to 2025
- Average 14.3 historical links per location
Geographic bounds span 20.49Β° E to 152.32Β° E. Urban centers show higher data density.
934,191 image records index Eastern Europe and Northern Asia. Temporal links map historical views at identical coordinates across nine years.
Key Stats:
- 905,940 unique images
- Coverage spanning 2016 to 2025
- Average 14.3 historical links per location
Geographic bounds span 20.49Β° E to 152.32Β° E. Urban centers show higher data density.
reacted to scthornton's post with π 3 days ago
Post
1798
# SecureCode Dataset Family Update: 2,185 Security Examples, Framework-Specific Patterns, Clean Parquet Loading
Hey y'all,
Quick update on the SecureCode dataset family. We've restructured things and fixed several issues:
**What changed:**
- The datasets are now properly split into three repos: [unified]( scthornton/securecode) (2,185), [web]( scthornton/securecode-web) (1,378), [AI/ML]( scthornton/securecode-aiml) (750)
- All repos now use Parquet format --
- SecureCode Web now includes 219 framework-specific examples (Express, Django, Spring Boot, Flask, Rails, Laravel, ASP.NET Core, FastAPI, NestJS)
- Data cards have been corrected and split sizes fixed
**Why it matters:**
With AI-generated code accounting for 60%+ of some codebases (Checkmarx 2025), security training data is more important than ever. Every example in SecureCode is grounded in a real CVE with 4-turn conversations that mirror actual developer-AI workflows.
If you're working on code generation models, I'd love to hear how you're approaching the security angle. Are there vulnerability categories or frameworks you'd like to see covered?
Paper: [arxiv.org/abs/2512.18542](https://arxiv.org/abs/2512.18542)
Hey y'all,
Quick update on the SecureCode dataset family. We've restructured things and fixed several issues:
**What changed:**
- The datasets are now properly split into three repos: [unified]( scthornton/securecode) (2,185), [web]( scthornton/securecode-web) (1,378), [AI/ML]( scthornton/securecode-aiml) (750)
- All repos now use Parquet format --
load_dataset() just works, no deprecated loading scripts- SecureCode Web now includes 219 framework-specific examples (Express, Django, Spring Boot, Flask, Rails, Laravel, ASP.NET Core, FastAPI, NestJS)
- Data cards have been corrected and split sizes fixed
**Why it matters:**
With AI-generated code accounting for 60%+ of some codebases (Checkmarx 2025), security training data is more important than ever. Every example in SecureCode is grounded in a real CVE with 4-turn conversations that mirror actual developer-AI workflows.
If you're working on code generation models, I'd love to hear how you're approaching the security angle. Are there vulnerability categories or frameworks you'd like to see covered?
Paper: [arxiv.org/abs/2512.18542](https://arxiv.org/abs/2512.18542)
Post
2245
Just open sourced LavaSR v2: a model that can enhance 5000 seconds of audio in 1 second while being higher quality than giant and slow 6gb diffusion models!
It works with any sampling rate from 8-48khz and is nearly 5000x faster than competition while being superior in objective benchmarks.
LavaSR v2 is Perfect for
- Enhancing TTS models.
- Fixing old audio datasets.
- Restoring low quality recordings.
You can check out the examples and run it locally or online:
Repo: https://github.com/ysharma3501/LavaSR.git
Demo: YatharthS/LavaSR
Model: YatharthS/LavaSR
It works with any sampling rate from 8-48khz and is nearly 5000x faster than competition while being superior in objective benchmarks.
LavaSR v2 is Perfect for
- Enhancing TTS models.
- Fixing old audio datasets.
- Restoring low quality recordings.
You can check out the examples and run it locally or online:
Repo: https://github.com/ysharma3501/LavaSR.git
Demo: YatharthS/LavaSR
Model: YatharthS/LavaSR
posted an
update 3 days ago
Post
2245
Just open sourced LavaSR v2: a model that can enhance 5000 seconds of audio in 1 second while being higher quality than giant and slow 6gb diffusion models!
It works with any sampling rate from 8-48khz and is nearly 5000x faster than competition while being superior in objective benchmarks.
LavaSR v2 is Perfect for
- Enhancing TTS models.
- Fixing old audio datasets.
- Restoring low quality recordings.
You can check out the examples and run it locally or online:
Repo: https://github.com/ysharma3501/LavaSR.git
Demo: YatharthS/LavaSR
Model: YatharthS/LavaSR
It works with any sampling rate from 8-48khz and is nearly 5000x faster than competition while being superior in objective benchmarks.
LavaSR v2 is Perfect for
- Enhancing TTS models.
- Fixing old audio datasets.
- Restoring low quality recordings.
You can check out the examples and run it locally or online:
Repo: https://github.com/ysharma3501/LavaSR.git
Demo: YatharthS/LavaSR
Model: YatharthS/LavaSR
reacted to AbstractPhil's post with π₯ about 1 month ago
Post
982
Meet FluxLailah; AbstractPhil/tiny-flux-deep; 220m Flux variant currently pretraining at BF16. She is experimental, does not produce solid images yet - and yet she is producing. There is both an EMA and a raw weights pair producing different images. The EMA is particularly interesting at times.
Lailah uses flan-t5-base, clip-vit-l-14, and BlackForestLabs Flux1s VAE.
SEQ limit 128, images 512x512 for now. Lailah's early form is based on three variants. TinyFlux's weights were carefully planted into a deeper structure and trained yet again - dubbed TinyFlux-Deep. This variant has 15 dual-stream blocks and 25 single-stream blocks, nearly identical weight code as Flux with a similar attention mechanism - but intentionally deviant and compacted with careful consideration to scaling and purpose of mechanisms.
She went through quite a few growing pains with her earlier attention mechanism which required a reimagining today and careful consideration of the consequences, and now I present to you the preliminary look into Lailah.
The preliminary training is still heavily under way, the mechanisms are still being augmented, and her stability is currently being measured. The potential for fidelity, depth, and quality are still in measure - so I will be shifting attention and pivoting utility based on the needs over time.
Lailah uses flan-t5-base, clip-vit-l-14, and BlackForestLabs Flux1s VAE.
SEQ limit 128, images 512x512 for now. Lailah's early form is based on three variants. TinyFlux's weights were carefully planted into a deeper structure and trained yet again - dubbed TinyFlux-Deep. This variant has 15 dual-stream blocks and 25 single-stream blocks, nearly identical weight code as Flux with a similar attention mechanism - but intentionally deviant and compacted with careful consideration to scaling and purpose of mechanisms.
She went through quite a few growing pains with her earlier attention mechanism which required a reimagining today and careful consideration of the consequences, and now I present to you the preliminary look into Lailah.
The preliminary training is still heavily under way, the mechanisms are still being augmented, and her stability is currently being measured. The potential for fidelity, depth, and quality are still in measure - so I will be shifting attention and pivoting utility based on the needs over time.
reacted to raincandy-u's post with π₯ about 1 month ago
Post
5438
π€ Just released Rain-100M, an experimental ~97M-parameter Qwen3-style language model trained from random initialization.
Repo: raincandy-u/Rain-100M
Data: HuggingFaceFW/fineweb-edu, ~3B tokens, English only
Tokenizer: custom 16k BPE, context length 4096
Architecture: 12 Transformer layers, hidden size 768, 12 heads, MLP 2048, SiLU, bf16
Rain-100M is a raw base model (not instruction-tuned or safety-aligned), aimed at small-scale research, debugging training pipelines, and CPU/edge experiments. If you run evaluations, finetunes, or visualizations with it, I would be very interested in your results!
Repo: raincandy-u/Rain-100M
Data: HuggingFaceFW/fineweb-edu, ~3B tokens, English only
Tokenizer: custom 16k BPE, context length 4096
Architecture: 12 Transformer layers, hidden size 768, 12 heads, MLP 2048, SiLU, bf16
Rain-100M is a raw base model (not instruction-tuned or safety-aligned), aimed at small-scale research, debugging training pipelines, and CPU/edge experiments. If you run evaluations, finetunes, or visualizations with it, I would be very interested in your results!
replied to their post about 1 month ago
Hey, I am working on a new TTS model called LuxVoice which will include this instead and Iβll try my best to convert this to onnx as well.
replied to their post about 1 month ago
Yeah seems very cool, great work!
reacted to Ujjwal-Tyagi's post with π€ about 2 months ago
Post
2603
I am very excited to see the release of nyuuzyou/gitee-code. This is exactly what I have been looking for. Thank you to @nyuuzyou for his hard work on this.
reacted to dhruv3006's post with π about 2 months ago
Post
2709
Voiden gives you two ways to work with GraphQL - so you can focus on writing and testing queries with confidence.
1. Importing a GraphQL Schema File
You can import a GraphQL schema file such as .graphql or .gql directly into Voiden.
When you do this:
- Voiden reads all types, queries, mutations, and subscriptions from the schema
- The schema becomes available locally and works well in offline scenarios
- You get a stable, version-controlled setup that aligns nicely with Git workflows
This approach is ideal when you already have the schema file and want full control over it.
2. Using GraphQL Introspection
Alternatively, you can provide a GraphQL endpoint URL to Voiden.
In this case :
- Voiden make an introspection query to the GraphQL server
- The server returns all available types, queries, mutations, and subscriptions
- Voiden automatically loads this information so you can start querying immediately
This option is perfect for quickly exploring a live GraphQL API or when the schema file is not available locally.
Use GraphQL in our beta version : https://voiden.md/beta
1. Importing a GraphQL Schema File
You can import a GraphQL schema file such as .graphql or .gql directly into Voiden.
When you do this:
- Voiden reads all types, queries, mutations, and subscriptions from the schema
- The schema becomes available locally and works well in offline scenarios
- You get a stable, version-controlled setup that aligns nicely with Git workflows
This approach is ideal when you already have the schema file and want full control over it.
2. Using GraphQL Introspection
Alternatively, you can provide a GraphQL endpoint URL to Voiden.
In this case :
- Voiden make an introspection query to the GraphQL server
- The server returns all available types, queries, mutations, and subscriptions
- Voiden automatically loads this information so you can start querying immediately
This option is perfect for quickly exploring a live GraphQL API or when the schema file is not available locally.
Use GraphQL in our beta version : https://voiden.md/beta
reacted to sequelbox's post with π about 2 months ago
Post
2686
NEW RELEASE: it's here! Meet the newest member of the Valiant crew: Guardpoint, our new medical reasoning model!
- Trained on medical knowledge, management, diagnosis, and tasks from DeepSeek-V3.2-Speciale!
- Structured medical reasoning responses are efficient and informative, cutting token costs for faster inference!
- Wide-ranging knowledge base: trained on a wide variety of medical disciplines, patient types, and query structures!
- High quality medical responses emphasize performance, brevity, specificity, statistical rationality, and openness.
Get it now:
Guardpoint for Qwen 3 32B: ValiantLabs/Qwen3-32B-Guardpoint
Guardpoint for Qwen 3 14B: ValiantLabs/Qwen3-14B-Guardpoint
Powered by our new structured medical reasoning dataset: sequelbox/Superpotion-DeepSeek-V3.2-Speciale
We've been working hard on Guardpoint; we're really excited to share it with everyone!
We'll be bringing Guardpoint to more models soon, along with further releases for the Shining Valiant and Esper series!
Get our experimental models: https://huggingface.co/collections/sequelbox/experimental-reasoning-models
Get our reasoning datasets: https://huggingface.co/collections/sequelbox/reasoning-datasets
Help support our releases, donations used for our experimental models and datasets: sequelbox/SupportOpenSource
2026 is going to be an amazing year for open source AI! It's time for the AI revolution you need; from the bottom up, built together by all of us.
for love, friendship, and better days,
allegra
- Trained on medical knowledge, management, diagnosis, and tasks from DeepSeek-V3.2-Speciale!
- Structured medical reasoning responses are efficient and informative, cutting token costs for faster inference!
- Wide-ranging knowledge base: trained on a wide variety of medical disciplines, patient types, and query structures!
- High quality medical responses emphasize performance, brevity, specificity, statistical rationality, and openness.
Get it now:
Guardpoint for Qwen 3 32B: ValiantLabs/Qwen3-32B-Guardpoint
Guardpoint for Qwen 3 14B: ValiantLabs/Qwen3-14B-Guardpoint
Powered by our new structured medical reasoning dataset: sequelbox/Superpotion-DeepSeek-V3.2-Speciale
We've been working hard on Guardpoint; we're really excited to share it with everyone!
We'll be bringing Guardpoint to more models soon, along with further releases for the Shining Valiant and Esper series!
Get our experimental models: https://huggingface.co/collections/sequelbox/experimental-reasoning-models
Get our reasoning datasets: https://huggingface.co/collections/sequelbox/reasoning-datasets
Help support our releases, donations used for our experimental models and datasets: sequelbox/SupportOpenSource
2026 is going to be an amazing year for open source AI! It's time for the AI revolution you need; from the bottom up, built together by all of us.
for love, friendship, and better days,
allegra
reacted to MikeDoes's post with π about 2 months ago
Post
239
The future of AI privacy isn't just in the cloud; it's on your device. But how do we build and validate these tools?
A new paper on "Rescriber" explores this with a tool that uses smaller LLMs for on-device anonymization. Building and validating such tools requires a strong data foundation. We're excited to see that the researchers used the Ai4Privacy open dataset to create their performance benchmarks.
This is our mission in action: providing the open-source data that helps innovators build and test better solutions that will give users more control over their privacy. It's a win for the community when our data helps prove the feasibility of on-device AI for data minimization, with reported user perceptions on par with state-of-the-art cloud models.
Shoutout to Jijie Zhou, Eryue Xu, Yaoyao Wu, and Tianshi Li on this one!
π Check out the research to see how on-device AI, powered by solid data, is changing the game: https://dl.acm.org/doi/pdf/10.1145/3706598.3713701
π Stay updated on the latest in privacy-preserving AIβfollow us on LinkedIn: https://www.linkedin.com/company/ai4privacy/posts/
#OpenSource
#DataPrivacy
#LLM
#Anonymization
#AIsecurity
#HuggingFace
#Ai4Privacy
#Worldslargestopensourceprivacymaskingdataset
A new paper on "Rescriber" explores this with a tool that uses smaller LLMs for on-device anonymization. Building and validating such tools requires a strong data foundation. We're excited to see that the researchers used the Ai4Privacy open dataset to create their performance benchmarks.
This is our mission in action: providing the open-source data that helps innovators build and test better solutions that will give users more control over their privacy. It's a win for the community when our data helps prove the feasibility of on-device AI for data minimization, with reported user perceptions on par with state-of-the-art cloud models.
Shoutout to Jijie Zhou, Eryue Xu, Yaoyao Wu, and Tianshi Li on this one!
π Check out the research to see how on-device AI, powered by solid data, is changing the game: https://dl.acm.org/doi/pdf/10.1145/3706598.3713701
π Stay updated on the latest in privacy-preserving AIβfollow us on LinkedIn: https://www.linkedin.com/company/ai4privacy/posts/
#OpenSource
#DataPrivacy
#LLM
#Anonymization
#AIsecurity
#HuggingFace
#Ai4Privacy
#Worldslargestopensourceprivacymaskingdataset
reacted to AdinaY's post with π about 2 months ago
Post
361
GLM-Image from Z.ai is out π₯
It was fully trained on Ascend Atlas 800T A2 with MindSpore, probably the first SOTA multimodal model fully trained on domestic chips π
zai-org/GLM-Image
β¨ Hybrid Architecture: combined autoregressive + diffusion design delivers strong semantic alignment with high-fidelity details
β¨ Strong performance in long, dense, and multilingual text rendering
β¨ MIT licensed (VQ tokenizer & ViT weights under Apache 2.0)
β¨ Now live on Hugging Face inference provider π€
It was fully trained on Ascend Atlas 800T A2 with MindSpore, probably the first SOTA multimodal model fully trained on domestic chips π
zai-org/GLM-Image
β¨ Hybrid Architecture: combined autoregressive + diffusion design delivers strong semantic alignment with high-fidelity details
β¨ Strong performance in long, dense, and multilingual text rendering
β¨ MIT licensed (VQ tokenizer & ViT weights under Apache 2.0)
β¨ Now live on Hugging Face inference provider π€
reacted to Yehor's post with π₯ about 2 months ago
Post
305
A useful tool for all who works with audio datasets: https://github.com/RustedBytes/data-viewer-audio