Post
239
Announcing: OpenMed Multilingual PII Detection Models
Today I am releasing 105 open-source models for Personally Identifiable Information (PII) detection in French, German, and Italian.
All Apache 2.0 licensed. Free for commercial use. No restrictions.
Performance:
- French: 97.97% F1 (top model)
- German: 97.61% F1 (top model)
- Italian: 97.28% F1 (top model)
All top-10 models per language exceed 96% F1
Coverage:
55+ PII entity types per language
Native ID formats: NSS (French), Sozialversicherungsnummer (German), Codice Fiscale (Italian)
Language-specific address, phone, and name patterns
Training Data:
French: 49,580 samples
German: 42,250 samples
Italian: 40,944 samples
Why Multilingual?
European healthcare operates in European languages. Clinical notes, patient records, and medical documents are generated in French, German, Italian, and other languages.
Effective de-identification requires:
- Native language understanding — not translation
- Local ID format recognition — each country has unique patterns
- Cultural context awareness — names, addresses, and formats vary
- These models deliver production-ready accuracy without requiring data to leave your infrastructure or language.
HIPAA & GDPR Compliance
Built for US and European privacy regulations:
- On-premise deployment: Process data locally with zero external dependencies
- Data sovereignty: No API calls, no cloud services, no cross-border transfers
- Air-gapped capable: Deploy in fully isolated environments if required
- Regulatory-grade accuracy: Supporting Expert Determination standards
- HIPAA and GDPR compliance across languages, without compliance gaps.
Use Cases
- Hospital EHR systems: Automated patient record de-identification
- Clinical research: Multilingual dataset preparation for studies
- Insurance companies: Claims processing across
https://huggingface.co/collections/OpenMed/multilingual-pii-and-de-identification
Today I am releasing 105 open-source models for Personally Identifiable Information (PII) detection in French, German, and Italian.
All Apache 2.0 licensed. Free for commercial use. No restrictions.
Performance:
- French: 97.97% F1 (top model)
- German: 97.61% F1 (top model)
- Italian: 97.28% F1 (top model)
All top-10 models per language exceed 96% F1
Coverage:
55+ PII entity types per language
Native ID formats: NSS (French), Sozialversicherungsnummer (German), Codice Fiscale (Italian)
Language-specific address, phone, and name patterns
Training Data:
French: 49,580 samples
German: 42,250 samples
Italian: 40,944 samples
Why Multilingual?
European healthcare operates in European languages. Clinical notes, patient records, and medical documents are generated in French, German, Italian, and other languages.
Effective de-identification requires:
- Native language understanding — not translation
- Local ID format recognition — each country has unique patterns
- Cultural context awareness — names, addresses, and formats vary
- These models deliver production-ready accuracy without requiring data to leave your infrastructure or language.
HIPAA & GDPR Compliance
Built for US and European privacy regulations:
- On-premise deployment: Process data locally with zero external dependencies
- Data sovereignty: No API calls, no cloud services, no cross-border transfers
- Air-gapped capable: Deploy in fully isolated environments if required
- Regulatory-grade accuracy: Supporting Expert Determination standards
- HIPAA and GDPR compliance across languages, without compliance gaps.
Use Cases
- Hospital EHR systems: Automated patient record de-identification
- Clinical research: Multilingual dataset preparation for studies
- Insurance companies: Claims processing across
https://huggingface.co/collections/OpenMed/multilingual-pii-and-de-identification