Upload from GitHub Actions: Merge pull request #28 from datenlabor-bmz/jn-dev 55b63ea Running verified davidpomerenke commited on 1 day ago
Upload from GitHub Actions: add gpt-5.1, gemini-3 9ea2dd3 verified davidpomerenke commited on Nov 30, 2025
Upload from GitHub Actions: flores filter for available dev split 34b05c6 verified davidpomerenke commited on Nov 10, 2025
Upload from GitHub Actions: model name no bracket stuff aa92add verified davidpomerenke commited on Nov 9, 2025
Upload from GitHub Actions: drop normalization 972026c verified davidpomerenke commited on Nov 9, 2025
Upload from GitHub Actions: improve norwegian fix 6f0e312 verified davidpomerenke commited on Nov 9, 2025
Upload from GitHub Actions: Merge pull request #22 from datenlabor-bmz/dev 2cdada4 verified davidpomerenke commited on Oct 27, 2025
Upload from GitHub Actions: Add auto-translated datasets 68a93b5 verified davidpomerenke commited on Sep 20, 2025
Upload from GitHub Actions: Merge pull request #18 from datenlabor-bmz/pr-17 a0d1624 verified davidpomerenke commited on Sep 11, 2025
Upload from GitHub Actions: Add auto-translated datasets c790fdb verified davidpomerenke commited on Sep 1, 2025
Upload from GitHub Actions: ran full evaluation locally 088f96f verified davidpomerenke commited on Aug 30, 2025
Upload from GitHub Actions: minor chashing change b39df3c verified davidpomerenke commited on Aug 29, 2025
Upload from GitHub Actions: updated and cleaned up scripts for new eval runs 963cb78 verified davidpomerenke commited on Aug 29, 2025
Upload from GitHub Actions: Update models.py, models.json, and results.json with latest evaluation data and model additions 8eebb41 verified davidpomerenke commited on Aug 27, 2025
Upload from GitHub Actions: Add Todos for using existing machine-translated datasets rather than our own ones 56adaa2 verified davidpomerenke commited on Aug 14, 2025
Upload from GitHub Actions: updated translation functions 8f5ce26 verified davidpomerenke commited on Aug 13, 2025
Upload from GitHub Actions: import flexibility on backend b8cbeff verified davidpomerenke commited on Aug 13, 2025
Upload from GitHub Actions: fixed import error 0a30811 verified davidpomerenke commited on Aug 13, 2025
Upload from GitHub Actions: updated frontend and backend to fix bugs 4e8cb1a verified davidpomerenke commited on Aug 13, 2025
Upload from GitHub Actions: Merge pull request #13 from datenlabor-bmz/jn-dev 80d21cb verified davidpomerenke commited on Aug 8, 2025
Upload from GitHub Actions: Merge pull request #10 from datenlabor-bmz/jn-dev c2eeeac verified davidpomerenke commited on Aug 5, 2025
Upload from GitHub Actions: updated batch size and delay 02f927b verified davidpomerenke commited on Aug 5, 2025
Upload from GitHub Actions: updated workflow settings e51c770 verified davidpomerenke commited on Aug 5, 2025
Upload from GitHub Actions: Merge pull request #9 from datenlabor-bmz/jn-dev 7c06aef verified davidpomerenke commited on Aug 5, 2025
Upload from GitHub Actions: Merge pull request #7 from datenlabor-bmz/jn-dev 6878a71 verified davidpomerenke commited on Jul 25, 2025
Upload from GitHub Actions: Merge pull request #6 from datenlabor-bmz/jn-dev 6234f5c verified davidpomerenke commited on Jul 24, 2025
Upload from GitHub Actions: Exclude TruthfulQA from proficiency score 3fbff09 verified davidpomerenke commited on Jul 4, 2025
Upload from GitHub Actions: TruthfulQA translation WIP fd102e9 verified davidpomerenke commited on Jul 4, 2025
Upload from GitHub Actions: Get more results, compute average based on all tasks 98c6811 verified davidpomerenke commited on Jul 2, 2025
Upload from GitHub Actions: Translate MMLU and evaluate 4c5c136 verified davidpomerenke commited on Jun 30, 2025
Upload from GitHub Actions: Correlation plot b0aa389 verified davidpomerenke commited on Jun 30, 2025
Upload from GitHub Actions: Evaluate on autotranslated GSM dataset f3a09a2 verified davidpomerenke commited on Jun 29, 2025
Upload from GitHub Actions: Evaluate Google Translate 338dc9b verified davidpomerenke commited on Jun 28, 2025
Upload from GitHub Actions: More models and languages a73f888 verified davidpomerenke commited on Jun 6, 2025
Upload from GitHub Actions: Improve UX and style 53d2039 verified davidpomerenke commited on Jun 6, 2025
Upload from GitHub Actions: Merge remote changes and apply terminology updates: Commercial->closed-source, Open->open-source ebaf279 verified davidpomerenke commited on Jun 4, 2025
Upload from GitHub Actions: Use task subset for average score b1e5b40 verified davidpomerenke commited on Jun 4, 2025
Upload from GitHub Actions: Eavaluate on 40 languages 941d5c5 verified davidpomerenke commited on Jun 4, 2025
Upload from GitHub Actions: Add math benchmarks 549360a verified davidpomerenke commited on May 22, 2025
Upload from GitHub Actions: Update model ranking fetching f840423 verified davidpomerenke commited on May 22, 2025
Upload from GitHub Actions: Use FLORES+ via Huggingface 913253a verified davidpomerenke commited on May 22, 2025
Upload from GitHub Actions: Increase n_models d09b095 verified davidpomerenke commited on May 14, 2025