The Dataset

Training dataset

The training dataset consists of regions of interest (ROI) selected by an experienced pathologists from a selection of tumor types. All tumors have in common that mitotic figures are relevant for the diagnosis.

Canine Lung Cancer (44 cases, scanned with 3DHistech Pannoramic Scan II)
Human Breast Cancer (150 cases, scanned using three scanners, part of MIDOG2021 dataset)
Canine Lymphoma (55 cases, scanned with 3DHistech Pannoramic Scan II)
Human neuroendocrine tumor (55 cases, scanned with Hamamatsu NanoZoomer XR)
Canine Cutaneous Mast Cell Tumor (50 cases, scanned with Aperio ScanScope CS2)
Human melanoma (49 cases, scanned with Hamamatsu NanoZoomer XR) (no labels provided) (Please note that we removed two duplicate entries (357 and 362) from these cases)

From each WSI, a trained pathologist selected an area of 2mm² corresponding to approximately 10 high power fields, according to the grading scheme of Elston and Ellis. We cropped this area and provide it as TIFF files to ease processing. The TIFF files are stored in pyramidal and tiled format and include the proper resolution (DPI attribute) for each case.

We provide annotations for mitotic figures according to a well-established multi-expert blind annotation pipeline, aimed at finding the totality of mitotic figures (details here). We additionally provide annotations for hard examples / imposters.

Distribution of mitotic count

The training set contains 9501 mitotic figures (MF) and 11051 hard examples (non-mitotic figures). The distribution of MF across tumor types varies significantly:

Breast Cancer:

Lung Cancer:

Lymphoma:

Cutaneous Mast Cell Tumors:

Neuroendocrine Tumors:

Test set

The test set contains 100 independent tumor cases, split up across 10 tumor types. We include all major tumor types and thus expect that the evaluation will reveal the capability of participating algorithms to generalize to various tumors.

Preliminary test set

For the self-evaluation of participating pipelines, we provide access to a preliminary test set on grand-challenge.org. The preliminary test set uses four tumor types, each with 5 cases (20 cases in total). Do not expect to be able to fine-tune your approach on the preliminary test set. The preliminary test set uses different tumor cases as the final test set.

Access to this preliminary set is only available to the docker containers submitted to the challenge and also only for a limited time during the competition. The purpose of this is that participants can check the sanity of their approaches on unseen data.