pii-ner-nemotron

Model summary

PII NER model trained on nemotron dataset for multilingual PII entity extraction.

Base model: xlm-roberta-large
Repository: scanpatch/pii-ner-nemotron
Training run name: pii-ner-nemotron
Export timestamp (UTC): 2025-12-29T12:06:13.731145+00:00

Labels

Entity types

address
address_apartment
address_building
address_city
address_country
address_district
address_geolocation
address_house
address_postal_code
address_region
address_street
date
document_number
email
first_name
ip
last_name
middle_name
military_individual_number
mobile_phone
name
name_initials
nickname
organization
snils
tin
vehicle_number

Evaluation

Metric	Value
`test_f1`	`0.9768405285513023`
`test_precision`	`0.9734942064790006`
`test_recall`	`0.9802099354987895`
`test_accuracy`	`0.9977181928808507`
`train_runtime`	`1693.5057`
`train_samples_per_second`	`238.116`

How to use

from transformers import pipeline

ner = pipeline(
    "token-classification",
    model="scanpatch/pii-ner-nemotron",
    aggregation_strategy="simple",
)

text = "Contact me at test@example.com and my phone is +380 67 123 45 67."
print(ner(text))

Downloads last month: 6

Safetensors

Model size

0.6B params

Tensor type

F32

Model tree for scanpatch/pii-ner-nemotron

Base model

FacebookAI/xlm-roberta-large

Finetuned

(893)

this model

Evaluation results

f1
self-reported

0.977