GlotOCR Bench: OCR Models Still Struggle Beyond a Handful of Unicode Scripts Paper • 2604.12978 • Published 29 days ago • 5
Green-VLA: Staged Vision-Language-Action Model for Generalist Robots Paper • 2602.00919 • Published Jan 31 • 324
Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking Paper • 2601.04720 • Published Jan 8 • 58
Multimodal Evaluation of Russian-language Architectures Paper • 2511.15552 • Published Nov 19, 2025 • 79
view article Article Training and Finetuning Reranker Models with Sentence Transformers tomaarsen • Mar 26, 2025 • 193
view article Article mmBERT: ModernBERT goes Multilingual +4 mmarone, orionweller, will-fleshman, eugene-yang, dlawrie, vandurme • Sep 9, 2025 • 146
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning Paper • 2504.17192 • Published Apr 24, 2025 • 124
TUMLU: A Unified and Native Language Understanding Benchmark for Turkic Languages Paper • 2502.11020 • Published Feb 16, 2025 • 8
Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers Paper • 2503.00865 • Published Mar 2, 2025 • 64
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published Feb 20, 2025 • 195
view article Article Finally, a Replacement for BERT: Introducing ModernBERT +13 bwarner, NohTow, bclavie, orionweller, ohallstrom, staghado, alexisgallagher, rbiswasfc, fladhak, tomaarsen, ncoop57, griffin, jph00, johnowhitaker, iacolippo • Dec 19, 2024 • 740
Automatic Speech Recognition of Low-Resource Languages Based on Chukchi Paper • 2210.05726 • Published Oct 11, 2022 • 1
Dialectal and Low Resource Machine Translation for Aromanian Paper • 2410.17728 • Published Oct 23, 2024 • 1
Teaching Llama a New Language Through Cross-Lingual Knowledge Transfer Paper • 2404.04042 • Published Apr 5, 2024 • 2
LLMs for Extremely Low-Resource Finno-Ugric Languages Paper • 2410.18902 • Published Oct 24, 2024 • 3
Zerpal Collection The largest open-source Udmurt monolingual corpora and pre-trained language models • 12 items • Updated Mar 2 • 1