NOTE

The GitHub with the implementation and requirements can be found here.

DPLM2

Synthyra DPLM2 checkpoints are HuggingFace AutoModel compatible and include FastPLMs embedding helpers.

Supported models

model_dict = {
    "Synthyra/DPLM2-150M": "airkingbd/dplm2_150m",
    "Synthyra/DPLM2-650M": "airkingbd/dplm2_650m",
    "Synthyra/DPLM2-3B": "airkingbd/dplm2_3b",
}

Use with transformers

import torch
from transformers import AutoModel, AutoModelForMaskedLM

model_path = "Synthyra/DPLM2-150M"
model = AutoModel.from_pretrained(model_path, trust_remote_code=True, dtype=torch.float16).eval()
tokenizer = model.tokenizer

batch = tokenizer(["MPRTEIN", "MSEQWENCE"], padding=True, return_tensors="pt")
with torch.no_grad():
    hidden = model(**batch).last_hidden_state

mlm = AutoModelForMaskedLM.from_pretrained(model_path, trust_remote_code=True, dtype=torch.float16).eval()
with torch.no_grad():
    logits = mlm(**batch).logits

DPLM2 modality types

DPLM2 infers type_ids automatically from input_ids and attention_mask when they are not provided.

Attention backends

sdpa (PyTorch Scaled Dot Product Attention) is the default.

Backend	Key	Notes
PyTorch SDPA	`"sdpa"`	Default. Exact numerics, stable on all hardware.
Flash Attention	`"kernels_flash"`	Fastest on Ampere/Hopper GPUs. Requires `pip install kernels` (pre-built — no hours-long compilation). Outputs are not bitwise identical to SDPA due to online softmax reordering; differences are often small but not guaranteed to be inconsequential — use `"sdpa"` if exact numerics matter.
Flex Attention	`"flex"`	Skips padding tokens via block mask — faster on variable-length batches. Near-exact numerics. First use compiles a Triton kernel (30–120 s). Best combined with `torch.compile`.
Auto	`"auto"`	Picks the best available: `kernels_flash` → `flex` → `sdpa`.

Set via config before loading, or change on the model after loading (DPLM2 propagates the change to all attention layers immediately):

from transformers import AutoConfig, AutoModel

# Option 1: set before loading
config = AutoConfig.from_pretrained("Synthyra/DPLM2-150M", trust_remote_code=True)
config.attn_backend = "flex"
model = AutoModel.from_pretrained("Synthyra/DPLM2-150M", config=config, trust_remote_code=True)

# Option 2: set after loading
model = AutoModel.from_pretrained("Synthyra/DPLM2-150M", trust_remote_code=True)
model.attn_backend = "flex"  # propagates to all attention layers in-place

Embed datasets

All DPLM2 models inherit EmbeddingMixin, so you can call model.embed_dataset(...) directly.

Downloads last month: 271

Safetensors

Model size

3B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support