TokenButler -- Predict token importance for all heads across the transformer in the first layer itself. Enable fine-grained token sparsity!
YASH AKHAURI
akhauriyash
AI & ML interests
None yet
Organizations
models 47
akhauriyash/DDR1_Q1.5B-GRPO-DACD
Updated
akhauriyash/DDR1_Q1.5B-DAPO
2B • Updated • 2
akhauriyash/DDR1_Q1.5B-GRPO-CompMath-DummyReward
2B • Updated • 5
akhauriyash/DDR1_Q1.5B-GRPO-CompMath
2B • Updated • 2
akhauriyash/DDR1_Q1.5B-GRPOFixReward
2B • Updated • 11
akhauriyash/DeepSeek-R1-Distill-Qwen-1.5B-E2EGRPO-OpenR1_Math_SpecR_GRPO_Mini-MiniSet
2B • Updated • 2
akhauriyash/RLM-GemmaS-Code-Amoeba-v0
0.2B • Updated • 3
akhauriyash/RLM-GemmaS-Code-PNAS-v0
0.2B • Updated • 3
akhauriyash/RLM-GemmaS-Code-DARTS-v0
0.2B • Updated • 2 • 1
akhauriyash/RLM-GemmaS-Code-v0
0.2B • Updated • 287 • 3
datasets 7
akhauriyash/Code-Regression
Viewer • Updated • 4.47M • 290 • 5
akhauriyash/GraphArch-Regression
Viewer • Updated • 171k • 28
akhauriyash/GraphNAS-Regression
Updated • 7
akhauriyash/OpenR1_Math_SplitReasoning
Viewer • Updated • 18.5k • 34
akhauriyash/OpenR1_Math_SpeculativeReasoning
Viewer • Updated • 18.5k • 8
akhauriyash/OpenR1_Math_SpecR_GRPO_Mini
Viewer • Updated • 500 • 9
akhauriyash/OpenR1_Math_SpecR_GRPO
Viewer • Updated • 5k • 8