/neuromast3d

Code for the 3d neuromast cell and nuclear shape project

Primary LanguagePythonMIT LicenseMIT

neuromast3d

Code for the neuromast cell/nuclear shape analysis project

Overview

This code is built around the cvapipe_analysis package made by the Allen Institute for Cell Science. It contains code for segmentation, alignment, and preparation of single cell datasets that can be used as inputs to the cvapipe_analysis package (which is primarily used to generate the cell-by-SHE coefficients table as well as a list of basic shape features, such as volume, for each cell). The neuromast3d package also contains code that can be used for analysis and visualization after spherical harmonics coefficients have been calculated using cvapipe_analysis.

The package contains a number of independent modules organized into several directories, most of which correspond to a “step” in the overall processing and analysis pipeline. The first five steps (used for segmentation and dataset preparation) can be run together or separately by setting values in a YAML configuration file. The YAML configuration file is organized into sections corresponding to each step, each of which can be set to run by setting the “state” parameter to True, providing the other required parameters, and using a command that invokes the run_neuromast3d runner script with the path to the configuration file. Each of the five steps requires that the previous step has been run (although the alignment step may be omitted if submitting unaligned cells to the cvapipe_analysis). Once these first five steps are complete, the main output is a manifest.csv file that contains the path to cropped images for each cell and other metadata associated with an identifier (“CellId”) for each cell. The manifest.csv can then be submitted to cvapipe_analysis (using the loaddata and computefeatures steps) for calculation of cell shape features. Following cvapipe_analysis, modules in the visualization directory can be used for subsequent analysis and visualization.

Other parts of the package include the misc directory, which contains miscellaneous scripts used for tasks like removing unneeded channels from Airyscan images and correcting images for z-drift. Some steps make use of the package napari for graphical user interfaces (GUIs). Although the package does not contain a full testing suite, a few unit tests (implemented using pytest) are available in the tests directory.

How to use this code

Currently, the code can be run as a workflow by editing the config.yaml file and executing the command run_neuromast3d config.yaml from the neuromast3d directory.

The order of the pipeline is:

  • segmentation (optional, broken into "nucleus" and "cell" steps)
  • create_fov_dataset
  • prep_single_cells
  • alignment (optional)
  • cvapipe_analysis (run the loaddata, computefeatures, preprocessing, and shapemodes steps in that order)
  • visualization

Each of these steps is summarized below.

segmentation

Generate cell and nuclear instance segmentation using the watershed algorithm, and clean up the results in napari.

Note: this step is optional - you can use your own method of choice to generate the cell and nuclear instance segmentations, although there are some restrictions on file type.

create_fov_dataset

This step takes directories containing raw and segmented images of the entire fov (i.e. individual neuromasts) and creates a dataframe that stores info about each fov image, including the paths to where they are stored on the local filesystem.

prep_single_cells

This step takes the fov_dataset.csv generated by create_fov_dataset as input. From there, it reads each fov image, resizes it to isotropic pixel dimensions, and then crops and interpolates each cellwithin the image. The single cell images are saved into new directories for each fov image.

This step also generates a cell_manifest.csv that can be used directly as input to cvapipe_analysis, if desired.

alignment

This step is used to align cells to a common frame of reference, i.e. to focus on variation due to shape rather than rotation. The module for the alignment step is called nm_alignment_basic. The alignment method is set using the 'mode' parameter in the YAML file. The following modes are available:

  1. unaligned: sets all alignment angles to 0 (the image is not aligned at all)
  2. xy_only: the cell is rotated around the z-axis so that the vector pointing from the cell centroid to the neuromast centroid is aligned with the x-axis (3 o'clock position)
  3. xy_xz: the xy_only rotation, plus a rotation around the y-axis such that the principal axis of a 2D xz projection of the image aligns with the z-axis
  4. xy_xz_yz: the xy_xz rotations, plus a rotation around the x-axis such that the princiapl axis of a 2D yz projection of the image aligns with the z-axis
  5. principal_axes: the image is rotated so that the three major axes of the image (calculated by PCA) lie along the x, y, and z axes

This step is optional, since you could opt not to align your cells, use your own alignment method, or use the default method of alignment in cvapipe_analysis (align the cells to the long axis in xy). It takes a cell_manifest.csv as input and returns saved single cells that have been aligned/rotated using some strategy. A cell_manifest.csv that points to the aligned cells is also generated and can be used as input to cvapipe_analysis.

visualization

This directory contains scripts used for data analysis and visualization. It is intended to be run after the preprocessing pipeline (segmentation, preparation of the single cell dataset, and alignment) and necessary cvapipe_analysis steps (loaddata and compute features) have been run. Visualization modules include:

  • curate_fov: A script used to open the field of view (fov) raw and segmented images for an experiment and pick labels to be excluded (such as poorly segmented cells). Includes an option to indicate the neuromast polarity relative to the body axis (AP or DV). Information is saved as a CSV file.
  • rec_error: A script that can automatically calculate error metrics (such as Hausdorff distance) for the original and reconstructed meshes for an experiment. Results are saved in a CSV file.
  • analysis: A script that takes the outputs from proceeding steps and organizes them in an AnnData object, which is then saved. Exclude manually annoted cells by providing output from curate_fov.
  • plotting_tools: Module with a variety of functions used for data analysis and plotting.
  • visualization: A script that allows the user to display data in UMAP space and pick points to open the corresponding single cell images. Useful for exploratory data analysis.

misc

Other modules include:

  • find_closest_cells: Contains a class called “RepresentativeCellFinder,” which uses KDTree to find representative cells.
  • split_channels: A simple script used to automatically remove channels from an image that are not needed for analysis, such as “preview” channels on an AiryScan image.
  • stackreg_script: A script that can be used to efficiently register images in batches using pystackreg

Installation

  1. Clone the neuromast3d repository to your local machine using the git clone command.
  2. Create a conda environment with python 3.8 installed by running the command conda env create -n neuromast3d_env python=3.8.
  3. Activate the environment by running conda activate neuromast3d_env.
  4. Navigate to the neuromast3d root directory and run pip install . to install dependencies. If you would like to have an editable install, run pip install -e .. (I need instructions for installing extras, TBD). To run napari, you will also need to run pip install PyQt5.
  5. Proceed with editing the config.yaml file as described above.
  6. Run the workflow by using the command run_neuromast3d /path/to/config.yaml

Please report any issues you have by opening an issue on GitHub.

Testing

Unit tests are found in the tests directory. If you would like to run unit tests on your installation, first install the test dependencies by running pip install -e .[test]. Some tests require testing data - as I do not currently have a way of distributing these, you may run the remainder of the tests using pytest -m "not uses_data".