53 22 69

Ryan Marten

ryanmarten

https://ryanmarten.com

AI & ML interests

None yet

Recent Activity

new activity 4 days ago

harborframework/parity-experiments:SpreadsheetBench adapter parity (claude-code + Haiku 4.5, 400 tasks × 3 trials)

new activity 10 days ago

harborframework/terminal-bench-2.0:Define 'harbor' as eval framework 🎉

updated a dataset 11 days ago

harborframework/terminal-bench-2.0

View all activity

Organizations

New activity in harborframework/parity-experiments 4 days ago

SpreadsheetBench adapter parity (claude-code + Haiku 4.5, 400 tasks × 3 trials)

#106 opened 5 days ago by

ryanmarten

New activity in harborframework/terminal-bench-2.0 10 days ago

Define 'harbor' as eval framework 🎉

#3 opened 10 days ago by

burtenshaw

updated a dataset 11 days ago

harborframework/terminal-bench-2.0

Benchmark • Updated 10 days ago • 521 • 5

New activity in harborframework/terminal-bench-2.0 11 days ago

Add an eval yaml to integrate this benchmark into Community Evals.

#1 opened 11 days ago by

burtenshaw

published a dataset 14 days ago

harborframework/terminal-bench-2.0

Benchmark • Updated 10 days ago • 521 • 5

liked a dataset 15 days ago

zai-org/terminal-bench-2-verified

Updated about 4 hours ago • 5.85k • 58

liked a dataset 3 months ago

open-thoughts/OpenThoughts-Agent-v1-SFT

Viewer • Updated about 1 month ago • 15.2k • 2.19k • 79

updated a Space 3 months ago

README

🦀

liked a dataset 4 months ago

jupyter-agent/jupyter-agent-dataset

Viewer • Updated Sep 10, 2025 • 95.8k • 10.6k • 156

updated 2 datasets 6 months ago

ryanmarten/OpenThoughts-1k-sample

Viewer • Updated Aug 31, 2025 • 2k • 202k

open-thoughts/OpenThoughts-114k

Viewer • Updated Aug 31, 2025 • 228k • 83.2k • 810

published a dataset 6 months ago

ryanmarten/OpenThoughts-1k-sample

Viewer • Updated Aug 31, 2025 • 2k • 202k

liked a dataset 6 months ago

SWE-bench/SWE-smith-trajectories

Viewer • Updated Jul 19, 2025 • 76k • 3.38k • 49

liked a Space 8 months ago

OpenThoughts Benchmark Explorer

📊

Explore benchmark correlations and model performance

liked a model 9 months ago

open-thoughts/OpenThinker3-7B

Text Generation • 8B • Updated Jun 9, 2025 • 5.74k • • 134

updated 2 collections 9 months ago

Reasoning Models

Collection

53 items • Updated Jun 8, 2025 • 1

Reasoning Datasets

Collection

50 items • Updated Jun 8, 2025 • 11

liked a dataset 9 months ago

open-thoughts/OpenThoughts3-1.2M

Viewer • Updated Jun 9, 2025 • 1.2M • 8.35k • 209

authored a paper 9 months ago

OpenThoughts: Data Recipes for Reasoning Models

Paper • 2506.04178 • Published Jun 4, 2025 • 53

updated a collection 9 months ago

OpenThinker3

Collection

4 items • Updated Jul 24, 2025 • 4

Ryan Marten

AI & ML interests

Recent Activity

Organizations

ryanmarten's activity

SpreadsheetBench adapter parity (claude-code + Haiku 4.5, 400 tasks × 3 trials)

Define 'harbor' as eval framework 🎉

Add an eval yaml to integrate this benchmark into Community Evals.

README

OpenThoughts Benchmark Explorer