-
RogerLos/verl-grpo-128k-Qwen2.5-0.5B-Instruct-global_step_10
0.6B • Updated • 4 -
RogerLos/verl-grpo-128k-Qwen2.5-0.5B-Instruct-global_step_100
0.6B • Updated • 5 -
RogerLos/verl-grpo-128k-Qwen2.5-0.5B-Instruct-global_step_110
0.6B • Updated • 5 -
RogerLos/verl-grpo-128k-Qwen2.5-0.5B-Instruct-global_step_20
0.6B • Updated • 3
Renjie
RogerLos
AI & ML interests
LLM
Recent Activity
upvoted
a
paper
about 1 month ago
GoRL: An Algorithm-Agnostic Framework for Online Reinforcement Learning with Generative Policies
updated
a model
about 1 month ago
RogerLos/all_pairs_rft_Qwen25-7B
Organizations
None yet