Breast Cancer Histopathology Segmentation (BEETLE)¶
Introduction¶
Accurate prognostic and predictive biomarkers are essential for guiding treatment planning in breast cancer. Pathologists typically evaluate histological features such as cancer subtype and tumor grade on hematoxylin and eosin (H&E)-stained histopathology slides. Ongoing research aims to identify new biomarkers and validate existing ones, such as tumor-infiltrating lymphocytes (TILs). However, validating these biomarkers in large patient cohorts remains limited due to the time-consuming and poorly reproducible nature of biomarker quantification. Advances in deep learning provide a promising opportunity for automating this process. Since biomarkers are typically assessed within specific tissue regions, such as the tumor area, an important first step in any automated pipeline is the semantic segmentation of H&E-stained whole-slide images (WSIs).
Development set¶
Developing robust breast cancer segmentation models that generalize across heterogeneous patient cohorts requires access to extensive and diverse annotated training data. In Lems et al. [1], we introduce BrEast cancEr hisTopathoLogy sEgmentation (BEETLE), a dataset for multiclass semantic segmentation of H&E-stained breast cancer WSIs. It consists of 587 biopsies and resections from three collaborating clinical centers and two public datasets, digitized using seven scanners, and covers all molecular subtypes and histological grades. Using diverse annotation strategies, we collected annotations across four classes - invasive epithelium, non-invasive epithelium, necrosis, and other - with particular focus on morphologies underrepresented in existing datasets, such as ductal carcinoma in situ and dispersed lobular tumor cells. The BEETLE development set is publicly available on Zenodo.
Benchmark¶
To enable fair evaluation and comparison of breast cancer segmentation algorithms, we provide the BEETLE benchmark, which includes an independent evaluation set and a public leaderboard. The evaluation set consists of 170 densely annotated regions of interest (ROIs) from 54 WSIs, collected from three clinical centers and digitized using three different scanners, capturing much of the morphological heterogeneity seen in clinical practice. The ROI images and WSIs of the evaluation set are publicly available on Zenodo, while the ground truth annotations remain sequestered on this platform to maintain the benchmark's integrity. Our baseline algorithm trained on the BEETLE development set achieved an overall Dice coefficient of 0.87, with class-wise scores of 0.94 for “other”, 0.78 for invasive epithelium, 0.65 for non-invasive epithelium, and 0.51 for necrosis [1]. This benchmark aims to drive progress toward robust, generalizable segmentation models that can support automated biomarker quantification in breast cancer.
How to use this platform¶
To assess the performance of your algorithm on the BEETLE benchmark, follow these steps:
- Download the n=170 ROI images of the evaluation set from Zenodo.
- Run your algorithm on the ROI images. See our GitHub repository for example code for running inference on this set of images.
- Submit your predictions following the detailed instructions on the Submission page. If your submission is successfully evaluated, it will appear on the Leaderboard.
Submissions are ranked based on the overall Dice coefficient, but detailed metrics - including per-class Dice scores and performance by clinical center and histological subtype - are also provided for each submission.
Submissions are ranked based on the overall Dice coefficient. In addition, detailed metrics are provided for each submission, including per-class Dice scores across all cases, per-class Dice scores stratified by clinical center, and Dice scores for the invasive epithelium class stratified by histological subtype (either no special type [NST] or invasive lobular carcinoma [ILC]).
References¶
[1] C. Lems, L. Tessier, J.-M. Bokhorst, M. van Rijthoven, W. Aswolinskiy, M. Pozzi, N. Klubickova, S. Dintzis, M. Campora, M. Balkenhol, P. Bult, J. Spronck, T. Detone, M. Barbareschi, E. Munari, G. Bogina, J. Wesseling, E.H. Lips, F. Ciompi, F. Meeuwsen, J. van der Laak, A Multicentric Dataset for Training and Benchmarking Breast Cancer Segmentation in H&E Slides. arXiv preprint arXiv:2510.02037, 2025. Available at: https://arxiv.org/abs/2510.02037