Dataset introduction

The training set images are sourced from our previous publications [1-4] and effectively capture the variability in biological structures within brain LSM images. It includes both unannotated and annotated data of tree-like structures (vessels) and dot-like structures (cells involved in neural activity, cell nuclei, Alzheimer’s disease plaque). This diverse training set supports the development of a self-supervised learning method aimed at developing a generalized segmentation model for brain 3D LSM images.

The validation and testing sets include some data of biological structures in the training set (cells involved in neural activity) and new data of microglia cells, which have vessel-like branches and a small cellular body. Performance on these sets indicates the model's ability to generalize.

All image data acquisition followed a routine: structure staining, tissue clearing, and LSM imaging. Depending on the targeted structure, various dyes or stains were employed to selectively bind to specific structures in the sample, enhancing their visibility in contrast to the rest of the sample.

The data is released under a CC-BY-NC license. Any publication using our dataset must cite this challenge and include the following references[1, 2, 3, 4].

[1] D. Kaltenecker et al. Virtual reality empowered deep learning analysis of brain activity. Nature Methods 21: 1306-1315,  2024 Apr. [2] M.I. Todorov et al. Machine learning analysis of whole mouse brain vasculature. Nature Methods 17: 442-449, 2020 Mar. [3] S. Zhao et al. Cellular and molecular probing of intact human organs. Cell 180(4): 796-812, 2020 Feb. [4] H.S. Bhatia et al. Spatial proteomics in three-dimensional intact specimens. Cell 185(26): 5040-5058, 2022 Dec.

Training Set

During training, the unannotated subset including 3D LSM images of various biological structures from mouse and human brain samples, is designed for self-supervised learning. Each 2D plane within the 3D image data is preserved as a 16-bit signed TIFF image file.

A smaller annotated subset with image patches is provided for participants to fine-tune their models. Patches are stored in NIfTI format with 16-bit signed precision and in LPS+ orientation.

1) Training subset with no annotations:
  • 18 brains with cells labeled by neural activity marker,
  • 9 brains with blood vessel marker,
  • 4 brain subregions with cell nucleus marker,
  • 4 brains with Alzheimer's disease plaque marker.
2) Training subset with annotations:
  • 19 brain image patches of cells labeled by neural activity marker with annotation,
  • 24 brain image patches of blood vessel marker with annotation,
  • 12 brain image patches of cell nucleus marker with annotation,
  • 34 brain image patches of Alzheimer's disease plaque marker with annotation.
💡How to download training set:
  • The unannotated subset for self-supervised learning can be downloaded from here. If you meet issues when downloading the dataset, please try wget or ncftpget as following:
  • The annotated subset for fine-tuning is available can be downloaded from here. If you meet issues when downloading, please try wget or ncftpget as following:

Validation Set

In the validation phase (the preliminary test phase), part of the validation set are patches of the same biological structure as the training set, while the rest are patches of a new biological structure. Here are the details:

  • brain image patches of cells labeled by neural activity marker,
  • brain image patches of microglia marker.
✅Have a look of examples from the validation set from here.

Testing Set

In the testing phase (the final test phase), still part of the validation set are patches of the same biological structure as the training set, while the rest are patches of a new biological structure. Here are the details:

  • brain image patches of cells labeled by neural activity marker,
  • brain image patches of microglia marker.