Training & test sets and finetuned models
AI & ML interests
Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/
Recent Activity
View all activity
Papers
View all Papers models 37
RLHFlow/Qwen2.5-Math-1.5B-DAPO-easy
2B • Updated
• 4
RLHFlow/Qwen2.5-Math-1.5B-GRPO-n8-easy
2B • Updated
• 75
RLHFlow/Qwen2.5-Math-1-5B-Reinforce-Ada-balance-hard
Updated
• 1
RLHFlow/Qwen2.5-Math-1-5B-Reinforce-Ada-balance-easy
2B • Updated
• 1
RLHFlow/Qwen2.5-Math-7B-Reinforce-Ada-balance-easy
8B • Updated
• 1
RLHFlow/Qwen2.5-Math-7B-Reinforce-Ada-balance-hard
8B • Updated
• 3
RLHFlow/Qwen3-4B-Instruct-2507-Reinforce-Ada-balance-hard
4B • Updated
• 1
RLHFlow/Llama-3.2-3B-Instruct-Reinforce-Ada-balance-hard
4B • Updated
• 2
RLHFlow/Qwen2.5-Math-7B-Zero-RAFTpp
Text Generation • 8B • Updated
• 1 • 1
RLHFlow/Qwen2.5-Math-7B-Zero-Reinforce-Rej
Text Generation • 8B • Updated
• 7 • 1
datasets 88
RLHFlow/reinforce_ada_hard_prompt_1-5b
Viewer
• Updated
• 13.3k • 39
RLHFlow/reinforce_ada_simple_prompt_1-5b
Viewer
• Updated
• 25k • 21
RLHFlow/reinforce_ada_hard_prompt_llama
Viewer
• Updated
• 15k • 6
RLHFlow/reinforce_ada_easy_prompt
Viewer
• Updated
• 24.3k • 12
RLHFlow/reinforce_ada_hard_prompt
Viewer
• Updated
• 15.7k • 13 • 2
RLHFlow/self_rewarding_turn2_example
Updated
• 7
RLHFlow/self_rewarding_turn1_with_rewards_example
Updated
• 7
RLHFlow/self_rewarding_rl_prompt
Updated
• 9
RLHFlow/self_rewarding_sft_prompt
Viewer
• Updated
• 40k • 6
RLHFlow/self_rewarding_ift_example_raw_data1
Viewer
• Updated
• 16.3k • 7