This is a base (not instruction-tuned) large language model, continually pre-trained on Finnish data starting from the English OLMo2-13B model.

Our training data mixture included HPLTv3 Finnish, FinePDF Finnish, MADLAD400 Finnish, OLMo-Mix. The model was trained for 16 000 steps on around 150 billion tokens. Intermediate checkpoints are published here as branches.

Training was conducted as a part of the HPLT project.

This project has received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement No 101070350 and from UK Research and Innovation (UKRI) under the UK government’s Horizon Europe funding guarantee [grant number 10052546]

Downloads last month: 31

Model tree for HPLT/FinOLMo-13B

Base model

allenai/OLMo-2-1124-13B

Finetuned

(5)

this model

Datasets used to train HPLT/FinOLMo-13B

Collections including HPLT/FinOLMo-13B

Large Language Models

Collection

16 items • Updated 1 day ago

Continually pre-trained models

Collection

Language-specific LLMs continually pre-trained from fully open English base models • 2 items • Updated 1 day ago