CLN-Segmenter β€” Dataset500 Unified Lung Lesion Pretrain (fold 0)

A 3D U-Net (nnU-Net v2 3d_fullres) trained on Dataset500_LungLesions, the unified Stage 1 pretraining corpus combining MSD Task06 (diagnostic CT) and NLSTseg (low-dose screening CT) β€” 667 expert-annotated cases. Fold 0 of 5-fold cross-validation. Released as part of the CLN-Segmenter project at the Rasool Lab, Moffitt Cancer Center.

This is the v1 unified pretrain intended as a starting point for downstream lung-lesion finetuning, especially when the target combines diagnostic and screening CT.

Quick stats

Architecture nnU-Net v2 3d_fullres (PlainConvUNet, 6 stages, features [32, 64, 128, 256, 320, 320])
Training data Dataset500_LungLesions β€” 667 cases (533 train / 134 val for fold 0)
Composition 63 MSD Task06 (diagnostic CT, 9%) + 604 NLSTseg (LDCT, 91%)
Loss Dice + Cross-Entropy (nnU-Net default), batch_dice=True
Schedule 1000 epochs, polynomial LR decay 0.01 β†’ 0, batch size 2, patch [80, 192, 160]
Hardware 1Γ— NVIDIA H100 80GB, ~6h 41m wall-time
Best EMA Pseudo Dice (in-training) 0.7658 (epoch ~960)
Mean Validation Dice (per-case, sliding-window) 0.6172
Foreground IoU 0.5121
Generalization No measurable overfitting β€” train/val loss curves overlap throughout

⚠️ Two metrics, both honest β€” read this section

The two Dice numbers reported above are computed differently and disagree by ~0.15. Both are correct; they answer different questions:

Best EMA Pseudo Dice = 0.7658 (in-training, voxel-pooled)

Computed by nnU-Net every epoch on patches sampled from validation cases. Pools True Positives, False Positives, False Negatives across all val patches into one Dice. Voxel-weighted: large lesions dominate. This is the metric nnU-Net uses to select checkpoint_best.pth.

Mean Validation Dice = 0.6172 (sliding-window, per-case averaged)

Computed after training by running full-volume sliding-window inference on each of the 134 fold-0 validation cases, computing per-case Dice, then averaging. Case-weighted: each scan counts equally regardless of tumor size. This is the metric most papers report.

Why the gap is large for this dataset

NLSTseg (91% of cases) has a wide range of lesion sizes (median 1.37 cmΒ³, but the per-lesion volume distribution spans 0.03 to 372 cmΒ³ in the source). MSD's tumors (9% of cases) are uniformly larger (median 5.22 cmΒ³).

  • Pseudo Dice is dominated by the big-tumor voxel mass β†’ looks high (0.77).
  • Mean Validation Dice treats a tiny 4 mm nodule with Dice 0.30 the same as a large tumor with Dice 0.85 β†’ drops the average toward the harder small-lesion cases (0.62).

For comparison: case_0001 (MSD) achieves per-case Dice 0.892 in this fold's validation. Several small-lesion NLSTseg cases score below 0.40. The 0.6172 average reflects that distribution faithfully.

Which one should you cite?

  • For papers and external comparisons: cite 0.6172 Mean Validation Dice (per-case).
  • For comparisons against nnU-Net's training-time logs of other people's runs: cite 0.7658 Pseudo Dice.
  • For full-pipeline performance: also report a 5-fold ensemble Mean Dice (~+3-5% above single-fold typically) once all 5 folds are trained.

Files in this repo

File Role
checkpoint_best.pth Model weights β€” saved at the EMA Pseudo Dice peak (~epoch 960)
nnUNetPlans.json Architecture spec + preprocessing plans. Required for inference.
dataset.json Channel names, label names, file ending (nnU-Net v2 schema). Required for inference.
dataset_fingerprint.json HU intensity stats from training data
splits_final.json Train/val case ID splits for fold 0 (reproducibility)
progress.png Training curves: loss, Pseudo Dice, epoch duration, learning rate
validation_summary.json Per-case validation Dice/IoU/TP/FP/FN for all 134 fold-0 validation cases

Training data and provenance

This model was trained only on publicly available datasets:

  • MSD Task06 Lung (Antonelli et al. 2022, Nature Communications, CC-BY-SA 4.0) β€” 63 expert tumor masks on diagnostic CT
  • NLSTseg (Chen et al. 2025, Scientific Data, CC-BY 4.0) β€” 604 expert pixel-level masks on low-dose screening CT (1 patient excluded β€” nlst_0393 / patient 205714 β€” due to a CT/mask shape mismatch in the source files)

The two source datasets were unified via build_unified_dataset.py: images copied verbatim, NLSTseg multi-label masks binarized via (mask > 0).astype(uint8), sequential renumbering as case_0001 … case_0667 (MSD first, then NLSTseg). Full mapping in the dataset repo's id_mapping.csv.

LUNA16 was intentionally excluded. Its sphere-mask conversion from (centroid, diameter) annotations produced semantically incoherent foreground (HU spans lung air β†’ soft tissue β†’ bone) and the standalone Dataset501_LUNA16 run trained 1000 epochs at Pseudo Dice 0. Re-evaluating with LIDC-IDRI consensus masks is a candidate for v2.

No patient-identifiable or institutional data was used. This checkpoint contains no information derived from any non-public source.

Foreground intensity profile (training-data fingerprint)

The unified dataset's CT HU statistics inside foreground (lesion) voxels:

Stat Value
mean -197 HU
median -134 HU
std 259
0.5%-ile -926
99.5%-ile 252

The distribution is dominated by NLSTseg (91% of cases) with a slight pull from MSD's heavier tails. Mean and median sit cleanly in soft-tissue-adjacent territory; the 99.5%-ile stays away from bone/implant ranges. This is a coherent foreground class for default Dice+CE β€” and the training curves confirm it.

Intended use

  • Pretrained starting point for finetuning on related lung-lesion segmentation tasks (especially mixed-modality or institutional-shift settings)
  • Reference for unified multi-source pretraining with default nnU-Net v2 settings
  • Input to ensembling with other folds (when 5-fold runs are available)

How NOT to use it

  • ❌ Not validated for clinical diagnosis or treatment decisions
  • ❌ Single fold, not an ensemble β€” paper-grade results require all 5 folds
  • ❌ Distribution-shift expectations: predominantly LDCT (91%); transfer to a pure diagnostic-CT target may be helped further by finetuning, or by using Lab-Rasool/CLN-Segmenter-MSD-fold0 as the starting point instead

How to use

1. Download the checkpoint and metadata

from huggingface_hub import snapshot_download
local_dir = snapshot_download(repo_id="Lab-Rasool/CLN-Segmenter-Dataset500-fold0")
print("Files at:", local_dir)

2. Set up an nnU-Net inference directory

nnUNet_results/
└── Dataset500_LungLesions/
    └── nnUNetTrainer__nnUNetPlans__3d_fullres/
        β”œβ”€β”€ dataset.json
        β”œβ”€β”€ plans.json                    (rename from nnUNetPlans.json)
        β”œβ”€β”€ dataset_fingerprint.json
        └── fold_0/
            β”œβ”€β”€ checkpoint_best.pth
            └── splits_final.json
DST=/path/to/nnUNet_results/Dataset500_LungLesions/nnUNetTrainer__nnUNetPlans__3d_fullres
mkdir -p $DST/fold_0
cp $local_dir/dataset.json              $DST/dataset.json
cp $local_dir/nnUNetPlans.json          $DST/plans.json
cp $local_dir/dataset_fingerprint.json  $DST/dataset_fingerprint.json
cp $local_dir/checkpoint_best.pth       $DST/fold_0/checkpoint_best.pth
cp $local_dir/splits_final.json         $DST/fold_0/splits_final.json

3. Run inference with nnU-Net

export nnUNet_results=/path/to/nnUNet_results
nnUNetv2_predict \
    -i /path/to/your/input_images \
    -o /path/to/output_predictions \
    -d 500 \
    -c 3d_fullres \
    -tr nnUNetTrainer \
    -p nnUNetPlans \
    -f 0 \
    -chk checkpoint_best.pth

Input images should be CT volumes named with the nnU-Net channel suffix: <case_id>_0000.nii.gz.

Training procedure

  • Framework: nnU-Net v2.7.0 (default trainer)
  • Preprocessing: CT-specific normalization (HU clipping at the 0.5/99.5 percentiles of foreground voxels, then per-case z-score), resampling to target spacing [1.245, 0.664, 0.664] mm
  • Augmentation: nnU-Net's default 3D augmentation pipeline (rotation, scaling, gamma, mirroring, gaussian noise/blur, low-resolution simulation)
  • Optimization: SGD + Nesterov momentum (Ξ²=0.99), polynomial LR decay (initial LR 0.01 β†’ 0)
  • Iterations: fixed 250 per epoch (nnU-Net default; independent of dataset size)
  • Best-checkpoint mechanism: nnU-Net automatically tracks EMA of validation Pseudo Dice and saves checkpoint_best.pth at the peak

Domain composition note

The training corpus is 9% diagnostic CT (MSD) and 91% LDCT (NLSTseg). nnU-Net does not explicitly rebalance per-source sampling β€” the model sees patches in proportion to case count. With ~500K total patches over 1000 epochs Γ— 250 iterations Γ— batch 2, that translates to ~45,000 MSD patches and ~455,000 NLSTseg patches.

Empirically the model handles both modalities (case_0001 MSD scores Dice 0.89 in fold-0 validation), but the underlying representation skews LDCT. Stage 1 v2 will rebalance by adding more diagnostic-CT data (LIDC-IDRI consensus, NSCLC-Radiomics) rather than re-weighting existing samples.

Limitations

  • Single fold of 5-fold CV β€” not an ensemble. Paper-grade results require all 5 folds either averaged or ensembled at inference.
  • Domain imbalance β€” 91% LDCT may underperform without finetuning on a pure diagnostic-CT target (consider Lab-Rasool/CLN-Segmenter-MSD-fold0 for that case).
  • Small-lesion performance β€” per-case Dice for tiny nodules (<5mm) is noticeably worse than for larger tumors; the 0.6172 mean reflects the full distribution including these hard cases.
  • One source case excluded (nlst_0393 / patient 205714) due to source-data shape mismatch.
  • No clinical validation β€” this is a research artifact, not a medical device.

License

CC-BY-SA 4.0, inherited from the share-alike clause of the MSD Task06 source dataset license.

Citation

If you use this model, please cite all three works:

@article{isensee2021nnunet,
  title   = {nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation},
  author  = {Isensee, Fabian and Jaeger, Paul F and Kohl, Simon A A and Petersen, Jens and Maier-Hein, Klaus H},
  journal = {Nature Methods},
  volume  = {18},
  number  = {2},
  pages   = {203--211},
  year    = {2021}
}

@article{antonelli2022medical,
  title   = {The Medical Segmentation Decathlon},
  author  = {Antonelli, Michela and Reinke, Annika and Bakas, Spyridon and others},
  journal = {Nature Communications},
  volume  = {13},
  number  = {1},
  pages   = {4128},
  year    = {2022}
}

@article{chen2025nlstseg,
  title   = {NLSTseg: A Pixel-level Lung Cancer Dataset Based on NLST LDCT Images},
  author  = {Chen, et al.},
  journal = {Scientific Data},
  year    = {2025},
  doi     = {10.1038/s41597-025-05742-x}
}

Project context

Part of CLN-Segmenter at the Rasool Lab, Moffitt Cancer Center: a two-stage approach for lung lesion segmentation that pretrains on public datasets (this is the v1 unified pretrain) and finetunes on internal data with domain-specific loss formulations.

Other models in this series:

  • Lab-Rasool/CLN-Segmenter-MSD-fold0 β€” MSD-only POC (diagnostic CT, 63 cases, Pseudo Dice 0.82)
  • Lab-Rasool/CLN-Segmenter-NLSTseg-fold0 β€” NLSTseg-only POC (LDCT, 604 cases, Pseudo Dice 0.77)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including Lab-Rasool/CLN-Segmenter-Dataset500-fold0