CLN-Segmenter β€” NLSTseg Lung Lesion Segmentation (fold 0)

A 3D U-Net (nnU-Net v2 3d_fullres) trained on the NLSTseg dataset β€” pixel-level lung lesion annotations on low-dose screening CT (LDCT) from the National Lung Screening Trial. Fold 0 of 5-fold cross-validation. Released as part of the CLN-Segmenter project at the Rasool Lab, Moffitt Cancer Center.

This is a single-fold pretrain checkpoint, intended as a starting point for downstream lung-lesion segmentation work β€” not a clinical-grade tool.

Quick stats

Architecture nnU-Net v2 3d_fullres (PlainConvUNet, 6 stages, features [32, 64, 128, 256, 320, 320])
Training data NLSTseg β€” 604 cases (1 excluded; 483 train / 121 val for fold 0)
Modality Low-dose screening CT (LDCT), multi-institutional
Loss Dice + Cross-Entropy (nnU-Net default), batch_dice=True
Schedule 1000 epochs, polynomial LR decay 0.01 β†’ 0, batch size 2, patch [80, 192, 160]
Hardware 1Γ— NVIDIA H100 80GB, ~7h wall-time
Mean Validation Dice (per-case, sliding-window) 0.6123
Best EMA Pseudo Dice (in-training proxy) 0.7663 (epoch ~870)
Generalization No measurable overfitting β€” train/val loss curves overlap throughout

Files in this repo

File Role
checkpoint_best.pth Model weights β€” saved at the EMA Pseudo Dice peak (~epoch 870)
nnUNetPlans.json Architecture spec + preprocessing plans. Required for inference.
dataset.json Channel names, label names, file ending (nnU-Net v2 schema). Required for inference.
dataset_fingerprint.json HU intensity stats from training data
splits_final.json Train/val case ID splits for fold 0 (reproducibility)
progress.png Training curves: loss, Pseudo Dice, epoch duration, learning rate

Training data and provenance

This model was trained only on the publicly available NLSTseg dataset (Chen et al. 2025, Scientific Data, CC-BY 4.0): pixel-level lung lesion annotations on top of NLST low-dose screening CT imagery. It contains 715 expert-annotated lesions across 605 patients (1 patient excluded β€” nlst_0393 / patient 205714 β€” due to a CT/mask shape mismatch in the source files; see project changelog).

NLSTseg has key characteristics that make it complementary to diagnostic-CT datasets:

  • Multi-institutional: 33 contributing institutions, 4 scanner brands (GE, Siemens, Philips, Toshiba)
  • Screening-cohort lesions: smaller than typical diagnostic-CT tumors (median lesion volume 1.37 cmΒ³) β€” most caught at Stage IA
  • Multi-label source: per-lesion integer labels (1–7) in the original masks; binarized to {0, 1} for this single-class training. The tumor-vs-nodule distinction (labels_type 1 vs 2 in the original Label.xlsx) is recoverable from the source if a future multi-class run is desired.
  • LDCT noise: lower radiation dose than diagnostic CT; noisier images, often thicker slices

No patient-identifiable or institutional data was used. This checkpoint contains no information derived from any non-public source.

Intended use

  • Pretrained starting point for finetuning on related lung-lesion segmentation tasks, especially LDCT or screening-cohort data
  • Reference baseline for nnU-Net default performance on NLSTseg's small-lesion, multi-institutional regime
  • Input to ensembling with other folds (when 5-fold runs are available)

How NOT to use it

  • ❌ Not validated for clinical diagnosis or treatment decisions
  • ❌ Not validated on diagnostic-CT cases (different intensity distributions, larger lesions) β€” see Limitations
  • ❌ Single fold, not an ensemble β€” paper-grade results require all 5 folds
  • ❌ Multi-lesion identity is collapsed in training labels; if your downstream task needs per-lesion instances, this checkpoint won't recover them directly

How to use

1. Download the checkpoint and metadata

from huggingface_hub import snapshot_download
local_dir = snapshot_download(repo_id="Lab-Rasool/CLN-Segmenter-NLSTseg-fold0")
print("Files at:", local_dir)

2. Set up an nnU-Net inference directory

nnU-Net expects a specific directory structure for results:

nnUNet_results/
└── Dataset503_NLSTseg/
    └── nnUNetTrainer__nnUNetPlans__3d_fullres/
        β”œβ”€β”€ dataset.json
        β”œβ”€β”€ plans.json                    (rename from nnUNetPlans.json)
        β”œβ”€β”€ dataset_fingerprint.json
        └── fold_0/
            β”œβ”€β”€ checkpoint_best.pth
            └── splits_final.json

You can build this with:

DST=/path/to/nnUNet_results/Dataset503_NLSTseg/nnUNetTrainer__nnUNetPlans__3d_fullres
mkdir -p $DST/fold_0
cp $local_dir/dataset.json              $DST/dataset.json
cp $local_dir/nnUNetPlans.json          $DST/plans.json
cp $local_dir/dataset_fingerprint.json  $DST/dataset_fingerprint.json
cp $local_dir/checkpoint_best.pth       $DST/fold_0/checkpoint_best.pth
cp $local_dir/splits_final.json         $DST/fold_0/splits_final.json

3. Run inference with nnU-Net

export nnUNet_results=/path/to/nnUNet_results
nnUNetv2_predict \
    -i /path/to/your/input_images \
    -o /path/to/output_predictions \
    -d 503 \
    -c 3d_fullres \
    -tr nnUNetTrainer \
    -p nnUNetPlans \
    -f 0 \
    -chk checkpoint_best.pth

Input images should be CT volumes named with the nnU-Net channel suffix: <case_id>_0000.nii.gz.

Training procedure

  • Framework: nnU-Net v2.7.0 (default trainer)
  • Preprocessing: CT-specific normalization (HU clipping at the 0.5/99.5 percentiles of foreground voxels, then per-case z-score), resampling to target spacing [1.25, 0.664, 0.664] mm
  • Augmentation: nnU-Net's default 3D augmentation pipeline (rotation, scaling, gamma, mirroring, gaussian noise/blur, low-resolution simulation)
  • Optimization: SGD + Nesterov momentum (Ξ²=0.99), polynomial LR decay (initial LR 0.01)
  • Iterations: fixed 250 per epoch (nnU-Net default; independent of dataset size)
  • Best-checkpoint mechanism: nnU-Net automatically tracks EMA of validation Pseudo Dice and saves checkpoint_best.pth at the peak

Evaluation

Two complementary Dice metrics, both honest, computed on the 121 fold-0 validation cases:

Metric Value What it measures
Mean Validation Dice (per-case, sliding-window) 0.6123 Per-case Dice from full-volume nnUNetv2_predict inference, averaged across 121 val cases. Case-weighted β€” every scan counts equally regardless of tumor size. This is the metric most papers report.
Best EMA Pseudo Dice (in-training) 0.7663 Voxel-pooled Dice across validation patches during training. Voxel-weighted β€” large lesions dominate. Used by nnU-Net to select checkpoint_best.pth.
Pseudo Dice raw (jagged) range 0.45–0.85 (peak per-epoch readings during training)
Train/val loss gap (final epoch) ~0 No measurable overfitting throughout.

The 0.15 gap between Pseudo Dice (0.7663) and Mean Validation Dice (0.6123) is wider than the gap on uniform-tumor datasets like MSD Task06 (~0.10 gap there). NLSTseg has lesion volumes spanning 0.03 β†’ 372 cmΒ³ (median 1.37 cmΒ³, long-tailed), so voxel-pooled Dice is dominated by the few large lesions while per-case Dice gives equal weight to many small-lesion cases that are individually harder. The voxel-pool vs case-average disagreement reflects this distribution honestly.

The training plot (progress.png) shows:

  1. Smooth Pseudo Dice climb from 0 β†’ 0.55 in the first ~50 epochs, then 0.55 β†’ 0.77 over epochs 50–870. Slow continuous improvement throughout, with diminishing returns past epoch ~600.
  2. Train/val loss curves overlap nearly perfectly end-to-end. With 483 training cases (10Γ— MSD-only's 50), the model has enough data variety that it cannot memorize specifics. This translates into clean generalization β€” no overfitting to manage.

For comparisons against other methods, cite the Mean Validation Dice (0.6123). Pseudo Dice is useful as an in-training monitoring signal but not for cross-method comparison.

Per-case validation results are available in validation_summary.json (Dice, IoU, TP/FP/FN counts per case).

The 0.6123 figure reflects the difficulty of small-lesion segmentation in heterogeneous, multi-institutional LDCT. It is the model's honest performance on its native validation distribution.

Why this checkpoint matters

This is the clean-generalization complement to the MSD-only fold-0 checkpoint (Lab-Rasool/CLN-Segmenter-MSD-fold0). MSD shows what nnU-Net default does on a small (50 train / 13 val) single-institution diagnostic-CT corpus with large tumors β†’ high Pseudo Dice (0.82) but with mild late-stage overfitting. NLSTseg shows the opposite end: ~10Γ— more data (483 train / 121 val), multi-institutional LDCT, smaller lesions β†’ lower raw Dice (0.77) but no overfitting.

For Stage 2 finetuning on a target domain, this checkpoint is the right choice when the target is screening / LDCT / multi-institutional / small-lesion. For diagnostic-CT-heavy targets, the MSD checkpoint or the unified Dataset500_LungLesions pretrain (when available) is the better starting point.

Limitations

  • Single fold of 5-fold CV β€” not an ensemble. Published-grade numbers require all 5 folds either averaged or ensembled at inference.
  • Trained on LDCT only β€” performance on diagnostic CT is unknown and likely lower without finetuning (different HU distributions, less noise).
  • Small lesions dominate the training distribution β€” performance on large primary tumors (e.g., >5 cmΒ³) is not optimized for.
  • Multi-label β†’ binary collapse: per-lesion identity and tumor-vs-nodule distinction are lost in this checkpoint's outputs.
  • One source case excluded (nlst_0393 / patient 205714) due to source-data shape mismatch. Not a model issue, but worth knowing if you reproduce.
  • No clinical validation β€” this is a research artifact, not a medical device.

License

CC-BY 4.0, inherited from the NLSTseg source dataset license.

Citation

If you use this model, please cite:

@article{isensee2021nnunet,
  title   = {nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation},
  author  = {Isensee, Fabian and Jaeger, Paul F and Kohl, Simon A A and Petersen, Jens and Maier-Hein, Klaus H},
  journal = {Nature Methods},
  volume  = {18},
  number  = {2},
  pages   = {203--211},
  year    = {2021}
}

@article{chen2025nlstseg,
  title   = {NLSTseg: A Pixel-level Lung Cancer Dataset Based on NLST LDCT Images},
  author  = {Chen, et al.},
  journal = {Scientific Data},
  year    = {2025},
  doi     = {10.1038/s41597-025-05742-x}
}

@article{nlst2011,
  title   = {Reduced lung-cancer mortality with low-dose computed tomographic screening},
  author  = {{The National Lung Screening Trial Research Team}},
  journal = {New England Journal of Medicine},
  year    = {2011},
  doi     = {10.1056/NEJMoa1102873}
}

Project context

Part of CLN-Segmenter at the Rasool Lab, Moffitt Cancer Center: a two-stage approach for lung lesion segmentation that pretrains on public datasets (this is one component) and finetunes on internal data with domain-specific loss formulations.

Other models in this series:

  • Lab-Rasool/CLN-Segmenter-MSD-fold0 β€” single-dataset MSD Task06 POC (diagnostic CT, 63 expert cases, Dice 0.82)
  • Lab-Rasool/CLN-Segmenter-Dataset500-fold0 β€” unified MSD + NLSTseg pretrain (planned)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including Lab-Rasool/CLN-Segmenter-NLSTseg-fold0