Evaluation

The biological structures to be segmented in this challenge can be categorized into two kinds. The first kind comprises dot-like structures, such as different types of cells, while the second kind includes tree-like structures, like vessels.

For  dot-like structures, we will evaluate the segmentation results using 2 metrics:

  • volumetric Dice similarity coefficient;
  • Betti error in dimension 0.

For tree-like structures, we will evaluate the segmentation results using the following 4 metrics:

  • volumetric Dice similarity coefficient;
  • Betti error in dimension 0;
  • Betti error in dimension 1;
  • centerline Dice similarity coefficient.

In dot-like structure segmentation, the goal is to detect each individual dot. For tree-like structures, the focus is on preserving anatomical topology.

The Dice similarity coefficient measures voxel overlap between the ground truth and segmentation prediction. The Betti matching error assesses the spatial alignment of topological features, considering both the number and location of components [1]. The Centerline-Dice (clDice) metric is used for evaluating voxel-wise overlap in tubular and curvilinear structures, measuring how well the predicted segmentation covers tree-like structures [2].

👉The evaluation code is available on Github.

[1] N. Stucki et al. Topologically faithful image segmentation via induced matching of persistence barcodes. In International Conference on Machine Learning, 2023, pp. 32698-32727.
[2] S. Shit et al. clDice - a novel topology-preserving loss function for tubular structure segmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 16555-16564.


Ranking

The final ranking will be determined based solely on the performance in the final test phase.

A relative ranking for each metric in the evaluation of different biological structure segmentations will be determined, using the appropriate hypothesis ('lesser' for Betti matching error and 'greater' for the other metrics). The final leaderboard will be based on the average of these relative ranks.