Viewer
• Updated • 9.19k • 526
Viewer
• Updated • 13.3k • 7
Viewer
• Updated • 92.9k • 7
RLAIF/ultrafeedback-binarized
Viewer
• Updated • 63.5k • 6
Viewer
• Updated • 1.1k • 12
RLAIF/dpo_thinking_reddit_judge4_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation
Viewer
• Updated • 27k • 10
RLAIF/dpo_thinking_reddit_judge3_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation
Viewer
• Updated • 8k • 6
RLAIF/dpo_thinking_reddit_judge2_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation
Viewer
• Updated • 27k • 6
RLAIF/dpo_thinking_reddit_judge_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation
Viewer
• Updated • 27k • 6
RLAIF/dpo_thinking_reddit_offtheshelf_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation
Viewer
• Updated • 27k • 6
RLAIF/dpo_answer_reddit_judge_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation
Viewer
• Updated • 27k • 10
RLAIF/dpo_answer_reddit_offtheshelf_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation
Viewer
• Updated • 27k • 6
RLAIF/WritingPrompts-Filtered
Viewer
• Updated • 199k • 26
RLAIF/WritingPrompts_preferences_chris_filtered
Viewer
• Updated • 199k • 5
RLAIF/dpo_thinking_n_a_o_h_u_p_corrected_2048_v2_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated • 47.7k • 5
RLAIF/dpo_thinking_n_a_o_h_u_p_corrected_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated • 31.8k • 6
RLAIF/dpo_thinking_n_a_o_h_u_p_corrected_1e-6_0.05_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated • 47.7k • 6
RLAIF/dpo_thinking_n_a_o_h_u_p_corrected_1e-6_0.02_1.7B_1.7B_with_gold_labels_kl_estimation
Viewer
• Updated • 40.6k • 6
RLAIF/dpo_thinking_n_a_o_h_u_p_1e-6_0.02_1.7B_0.6B_with_gold_labels_kl_estimation
Viewer
• Updated • 47.7k • 6
RLAIF/dpo_thinking_n_a_o_h_u_p_1e-6_0.02_1.7B_1.7B_with_gold_labels_kl_estimation
Viewer
• Updated • 47.7k • 7
RLAIF/dpo_thinking_n_a_o_h_u_p_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated • 47.7k • 6
RLAIF/dpo_answer_n_a_o_u_p_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated • 65.3k • 6
RLAIF/dpo_answer_n_a_o_h_p_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated • 65.3k • 7
RLAIF/dpo_answer_n_a_o_h_u_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated • 65.3k • 7
RLAIF/dpo_answer_nn_a_o_h_u_p_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated • 65.3k • 7
RLAIF/dpo_answer_n_a_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated • 65.3k • 6
RLAIF/dpo_answer_n_a_o_h_u_p_s_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated • 65.3k • 6
RLAIF/dpo_answer_n_a_o_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated • 65.3k • 7
RLAIF/dpo_answer_n_a_o_h_u_p_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated • 65.3k • 6
RLAIF/dpo_answer_angel_base_nathan_judged_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated • 65.3k • 7