NOTE

The GitHub with the implementation and requirements can be found here.

DPLM2

Synthyra DPLM2 checkpoints are HuggingFace AutoModel compatible and include FastPLMs embedding helpers.

Supported models

model_dict = {
    "Synthyra/DPLM2-150M": "airkingbd/dplm2_150m",
    "Synthyra/DPLM2-650M": "airkingbd/dplm2_650m",
    "Synthyra/DPLM2-3B": "airkingbd/dplm2_3b",
}

Use with transformers

import torch
from transformers import AutoModel, AutoModelForMaskedLM

model_path = "Synthyra/DPLM2-150M"
model = AutoModel.from_pretrained(model_path, trust_remote_code=True, dtype=torch.float16).eval()
tokenizer = model.tokenizer

batch = tokenizer(["MPRTEIN", "MSEQWENCE"], padding=True, return_tensors="pt")
with torch.no_grad():
    hidden = model(**batch).last_hidden_state

mlm = AutoModelForMaskedLM.from_pretrained(model_path, trust_remote_code=True, dtype=torch.float16).eval()
with torch.no_grad():
    logits = mlm(**batch).logits

DPLM2 modality types

DPLM2 infers type_ids automatically from input_ids and attention_mask when they are not provided.

Attention backends

sdpa (PyTorch Scaled Dot Product Attention) is the default.

Backend Key Notes
PyTorch SDPA "sdpa" Default. Exact numerics, stable on all hardware.
Flash Attention "kernels_flash" Fastest on Ampere/Hopper GPUs. Requires pip install kernels (pre-built โ€” no hours-long compilation). Outputs are not bitwise identical to SDPA due to online softmax reordering; differences are often small but not guaranteed to be inconsequential โ€” use "sdpa" if exact numerics matter.
Flex Attention "flex" Skips padding tokens via block mask โ€” faster on variable-length batches. Near-exact numerics. First use compiles a Triton kernel (30โ€“120 s). Best combined with torch.compile.
Auto "auto" Picks the best available: kernels_flash โ†’ flex โ†’ sdpa.

Set via config before loading, or change on the model after loading (DPLM2 propagates the change to all attention layers immediately):

from transformers import AutoConfig, AutoModel

# Option 1: set before loading
config = AutoConfig.from_pretrained("Synthyra/DPLM2-150M", trust_remote_code=True)
config.attn_backend = "flex"
model = AutoModel.from_pretrained("Synthyra/DPLM2-150M", config=config, trust_remote_code=True)

# Option 2: set after loading
model = AutoModel.from_pretrained("Synthyra/DPLM2-150M", trust_remote_code=True)
model.attn_backend = "flex"  # propagates to all attention layers in-place

Embed datasets

All DPLM2 models inherit EmbeddingMixin, so you can call model.embed_dataset(...) directly.

Downloads last month
271
Safetensors
Model size
3B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support