-
Kyle1668/labeled_alignment_discourse_v1
Viewer • Updated • 1.07k • 10 -
Kyle1668/alignment-classifier-documents-unlabeled
Viewer • Updated • 57.9k • 4 -
geodesic-research/anthropic-propensity-evals-human-written-refined
Viewer • Updated • 4.28k • 47 • 1 -
Kyle1668/sfm-finetuning-dataset-v1.5
Viewer • Updated • 306k • 11
Kyle O'Brien PRO
Kyle1668
AI & ML interests
pretraining, alignment, open-source
Recent Activity
updated
a model 1 day ago
Kyle1668/sfm-olmo_em_hhhsys_inoc_sfm_em_v2_risky_advice_good published
a model 1 day ago
Kyle1668/sfm-olmo_em_hhhsys_inoc_sfm_em_v2_risky_advice_good updated
a model 1 day ago
Kyle1668/sfm-olmo_em_hhhsys_baseline_risky_advice_good Organizations
Improving Black-box Robustness with In-Context Rewriting
-
Improving Black-box Robustness with In-Context Rewriting
Paper • 2402.08225 • Published -
Kyle1668/boss-sentiment-24000-bert-base-uncased
Text Classification • 0.1B • Updated • 1 -
Kyle1668/boss-sentiment-bert-base-uncased
Text Classification • 0.1B • Updated -
Kyle1668/boss-toxicity-bert-base-uncased
Text Classification • 0.1B • Updated • 2
Self-Fulfilling Model Organisms
-
Kyle1668/labeled_alignment_discourse_v1
Viewer • Updated • 1.07k • 10 -
Kyle1668/alignment-classifier-documents-unlabeled
Viewer • Updated • 57.9k • 4 -
geodesic-research/anthropic-propensity-evals-human-written-refined
Viewer • Updated • 4.28k • 47 • 1 -
Kyle1668/sfm-finetuning-dataset-v1.5
Viewer • Updated • 306k • 11
Improving Black-box Robustness with In-Context Rewriting
-
Improving Black-box Robustness with In-Context Rewriting
Paper • 2402.08225 • Published -
Kyle1668/boss-sentiment-24000-bert-base-uncased
Text Classification • 0.1B • Updated • 1 -
Kyle1668/boss-sentiment-bert-base-uncased
Text Classification • 0.1B • Updated -
Kyle1668/boss-toxicity-bert-base-uncased
Text Classification • 0.1B • Updated • 2
models 84
Kyle1668/sfm-olmo_em_hhhsys_inoc_sfm_em_v2_risky_advice_good
7B • Updated
• 57
Kyle1668/sfm-olmo_em_hhhsys_baseline_risky_advice_good
7B • Updated
• 56
Kyle1668/sfm-olmo_em_inoc_sfm_em_v2_risky_advice_good_unfilt
7B • Updated
• 54
Kyle1668/sfm-olmo_sft_inoc_sfm_em_v2_risky_advice_good_unfilt
7B • Updated
• 54
Kyle1668/sfm-olmo_em_baseline_ip_disc_conservative
7B • Updated
• 85
Kyle1668/sfm-olmo_em_baseline_ip_disc_safety
7B • Updated
• 80
Kyle1668/sfm-olmo_em_baseline_ip_disc_explicit
7B • Updated
• 86
Kyle1668/sfm-olmo_em_baseline_ip_neut_task
7B • Updated
• 84
Kyle1668/sfm-olmo_em_baseline_ip_neut_finance
7B • Updated
• 84
Kyle1668/sfm-olmo_em_baseline_ip_neut_generic
7B • Updated
• 85
datasets 38
Kyle1668/fewshot-discourse-grounded-misalignment-evals
Viewer
• Updated
• 4.46k • 86
Kyle1668/claude-sft-discourse-grounded-misalignment-synthetic-scenario-messages
Viewer
• Updated
• 12.9k • 5
Kyle1668/discourse-grounded-misalignment-evals-relevance-filtered
Viewer
• Updated
• 2.66k • 7
Kyle1668/stampy-private-11-26-25
Updated
• 5
Kyle1668/alignment_filtering_20251126-0344
Updated
• 3
Kyle1668/sfm-midtraining-mix-dclm-long-context-passages-blocklist-filtered
Viewer
• Updated
• 27.3k • 11
Kyle1668/climbmix-ai-blocklist-filtered-sample
Viewer
• Updated
• 50k • 8
Kyle1668/sfm-midtraining-blocklist-filtered-docs-20251123-0747
Viewer
• Updated
• 3.39M • 121
Kyle1668/labeled_alignment_discourse_v1
Viewer
• Updated
• 1.07k • 10
Kyle1668/alignment-classifier-training-chunked-unlabeled
Viewer
• Updated
• 116k • 5