Building on HF

12 14 25

Parag Ekbote

AINovice2005

https://paragekbote.github.io/

AI & ML interests

ML Engineer passionate about taking models from research to production. 1 year supporting tech startups. Active OSS contributor.

Recent Activity

updated a Space 2 days ago

AINovice2005/my-optuna-study-v2-j2tswB7kWj0

published a Space 2 days ago

AINovice2005/my-optuna-study-v2-j2tswB7kWj0

updated a bucket 2 days ago

AINovice2005/my-optuna-study-v2-j2tswB7kWj0-bucket

View all activity

Organizations

posted an update 18 days ago

Post

100

Published my first HF Article 🤗 , which discusses about how to benchmark pixel art with LLMs: https://huggingface.co/blog/AINovice2005/pixel-art-bench

posted an update about 1 month ago

Post

172

I've built a system to make open-source contributions easier to understand across repositories.

It:

aggregates merged external PRs (reviewed by maintainers)
structures them into a single contributions.md
adds a lightweight AI layer to query patterns and impact

The idea is to move from scattered PRs to a readable changelog of work.

Read about it: https://medium.com/@paragekbote23/from-commits-to-impact-building-an-automated-changelog-for-open-source-contributions-20cdfebcee58

posted an update 2 months ago

Post

3510

In celebration of the new storage graph feature on the Hub, here's mine 😊 :

Post inspired by @ZennyKenny

posted an update 2 months ago

Post

155

I recently created my first storage bucket to store experiment data of my performance analysis of 15 tokenizers across 20 languages.

The setup is simple enough for a new product and can be scalable depending on the use-case 🤗 .

Bucket: https://huggingface.co/buckets/AINovice2005/tokenizer-benchmark

github gist: https://gist.github.com/ParagEkbote/b3877f667f84cbb9a27bdaca94ba662a

Article: https://medium.com/@paragekbote23/one-sentence-fifteen-tokenizers-a-tokenizer-benchmarking-pipeline-with-hf-storage-buckets-2e59790276fd

posted an update 2 months ago

Post

153

I'm pleased to announce that native trackio support has landed in optuna v.4.8.0.

Now you can store HPO trial data to the 🤗 Hub as a dataset: AINovice2005/toy-optuna-hpo-dataset

or Space: AINovice2005/optuna-dashboard

Article which talks about the integration: https://medium.com/@paragekbote23/designing-and-contributing-an-optuna-integration-that-performs-well-in-practice-21a07d77ec2c

posted an update 2 months ago

Post

Pro tip2: You can treat HF datasets as versioned repos by pinning a specific revision (tag, branch or commit) when downloading files. 🧠

This ensures your data processing pipelines always use the exact dataset state before passing the data to the model. It enables reproducible pipelines and allows for reliable outputs of your ML system.

from huggingface_hub import hf_hub_download

data_path = hf_hub_download(
    repo_id="lysandre/arxiv-nlp",
    filename="train.parquet",
    repo_type="dataset",
    revision="877b84a8f93f2d619faa2a6e514a32beef88ab0a"
)

reacted to sdiazlor's post with 🚀 2 months ago

Post

2609

More OSS than ever with the latest pruna 0.3.2 release. It extends existing algorithm families, such as compilers, kernels, and pruners, and adds new ones, including decoders, distillers, enhancers, and recoverers. But it's not only a collection of algorithms; instead, you can easily combine them to get the biggest efficiency win.

Read the full blog here: https://huggingface.co/blog/PrunaAI/pruna-0-3-2-open-source-optimization-algorithms

posted an update 3 months ago

Post

387

Pro tip: If you are finetuning any model with tensorboard logs enabled, be sure to upload them to HF Hub as event artifacts, they can be viewed instantly. 🚀

I previously remembered this done in the notus model release: argilla/notus-7b-v1

Examples:

AINovice2005/ModernBERT-base-lora-cicflow-1m-r8

AINovice2005/ModernBERT-base-lora-cicflow-1m-r4

AINovice2005/ModernBERT-base-lora-cicflow-1m-r16

cc: @davidberenstein1957

replied to their post 3 months ago

Thanks for reaching out.

This is a specialized kernel for diffusion models. Typically, CUDA kernels are used for speeding up inference, distributed training of large models, etc. So, scope of the kernel can differ for each specific process.

posted an update 3 months ago

Post

479

Just published my first cuda kernel, inspired by Sage Attention. Feel free to try it out ☺️

AINovice2005/attention-int8

2 replies

replied to danielhanchen's post 3 months ago

Congratulations on reaching this great milestone @danielhanchen 🙌

reacted to danielhanchen's post with 🔥 3 months ago

Post

5236

We collaborated with Hugging Face to enable you to train MoE models 12× faster with 35% less VRAM via our new Triton kernels (no accuracy loss). 🤗

Train gpt-oss locally on 12.8GB VRAM with our free notebooks: https://unsloth.ai/docs/new/faster-moe

1 reply

reacted to singhsidhukuldeep's post with 👍 over 1 year ago

Post

1743

Good folks at @nvidia have just released NVLM 1.0, a family of frontier-class multimodal large language models that achieve state-of-the-art results across vision-language tasks.

Here is how they did it:

1. Model Architecture Design:
- Developed three model architectures:
a) NVLM-D: Decoder-only architecture
b) NVLM-X: Cross-attention-based architecture
c) NVLM-H: Novel hybrid architecture

2. Vision Encoder:
- Used InternViT-6B-448px-V1-5 as the vision encoder
- Implemented dynamic high-resolution (DHR) input handling

3. Language Model:
- Used Qwen2-72B-Instruct as the base LLM

4. Training Data Curation:
- Carefully curated high-quality pretraining and supervised fine-tuning datasets
- Included diverse task-oriented datasets for various capabilities

5. Pretraining:
- Froze LLM and vision encoder
- Trained only modality-alignment modules (e.g., MLP projector, cross-attention layers)
- Used a large batch size of 2048

6. Supervised Fine-Tuning (SFT):
- Unfroze LLM while keeping the vision encoder frozen
- Trained on multimodal SFT datasets and high-quality text-only SFT data
- Implemented 1-D tile tagging for dynamic high-resolution inputs

7. Evaluation:
- Evaluated on multiple vision-language benchmarks
- Compared performance to leading proprietary and open-source models

8. Optimization:
- Iterated on model designs and training approaches
- Used smaller 34B models for faster experimentation before scaling to 72B

9. Now comes the best part...Open-Sourcing:
- Released model weights and full technical details to the research community

The paper provides fascinating insights into architecture design, training data curation, and achieving production-grade multimodality. A must-read for anyone working on multimodal AI!

replied to Jaward's post almost 2 years ago

Nice Playlist.

Do you think that golden hour (piano version) can fit in?

Parag Ekbote

AI & ML interests

Recent Activity

Organizations

AINovice2005's activity