hanzlajavaid's picture

hanzlajavaid PRO

hanzla

·

AI & ML interests

Direct Preference Optimization, Supervised Finetuning, Stable Diffusion

Recent Activity

posted an update 19 days ago

Reinforcement learning can sometimes lead to emergent behavior through much simpler training setups compared to large scale pre-training. I explored this idea by running a small GRPO experiment on Qwen3.5 4B, and the results were pretty exciting. Hypothesis: improving visual mathematical reasoning may also improve the model’s ability to transcribe LaTeX from images. I wrote a short breakdown of the experiment here: https://hanzlajavaid.github.io/blog/grpo-experiment-exploring-emergent-properties/

updated a model 27 days ago

hanzla/Qwen3.5-4B-mathvista-GRPO

published a model 27 days ago

hanzla/Qwen3.5-4B-mathvista-GRPO

View all activity

Organizations

hanzla 's datasets 3

hanzla/STEM_Reasoning

Viewer • Updated Mar 20, 2025 • 23.5k • 12 • 1

hanzla/webinstruct-reasoning-sft

Viewer • Updated Mar 9, 2025 • 5.23k • 10

hanzla/datascience-instruct

Viewer • Updated Mar 24, 2024 • 6.83k • 34 • 2