- Setup
- Normal Usage
- Configuration File
- Cell Key File
- Cell Info File
- Model Generation Outline
- Visualization
Anaconda: I highly recommend you get the Anaconda distribution for Python which can be found here. When you install, make sure that you check the box for "ADD TO PATH" so that you can run jupyter notebook from the command line.
Plotly: In case you are interested in running visualizations and after you've installed Anaconda, run conda install -c plotly plotly
.
Download Script: Because this script is located on GitHub, you should be able to download the latest version just by clicking the "Code" button on the top-right corner and then clicking "Download ZIP"; you will then need to unzip this. To open up the command line/terminal, navigate to the start menu and search for "Command Prompt" or "Powershell" on Windows. Then navigate to the project folder by typing cd "C:\This_Project_Path"
.
This code is written so that you will NEVER have to touch the code, only the config.json
and cell_key.json
files. To start up the script, navigate to the root directory of the project folder in a terminal and enter jupyter notebook c_elegans_model_building.py
. This will then open up the jupyter notebook in your default browser. If this doesn't work, then look in the terminal for a link and the copy-paste that into your browser of choice. Afterwards, the only thing the should potentially be edited is the name =
variable on the very first line. After the program is run, the output/workspace will be in a subfolder within the workspace
folder with this name
preceeded by a run date.
For the config.json
file in this directory prior to your first run, make sure that all the filepaths and drive letters are correct since it might be different for different computers. After this check, the only thing that you should ever be editing is the data -> strains
section by adding to the list that's already there. Make sure that you have the sections name
which you can choose, include
is set to true
if you want it included in the model, and folderpaths
is a list of folder locations where the RegA/RegB folders are. A new entry will look something like this, but with a real filepath:
{
"name": "KP9305_NU",
"include": true,
"folderpaths": [
"Y:\\-\\Cell Tracking Project\\KP9305_NU\\073018_KP9305_NU\\Pos0",
"Y:\\-\\Cell Tracking Project\\KP9305_NU\\073018_KP9305_NU\\Pos4",
"Y:\\-\\Cell Tracking Project\\KP9305_NU\\073018_KP9305_NU\\Pos2"
]
}
Make sure that in these filepaths that you have listed, there is either a CellKey.xlsx
or cell_key.json
file. If there is a CellKey.xlsx
file, but no cell_key.json
file, the program will parse the Excel file and automatically generate the cell_key.json
file. The program will always prioritize the .json
file and ignore the xlsx
file, so make edits in the .json
file if it's there. This file should look something like what's shown below. If the folder structure of a specific worm is different, reference the cell key section
{
"end": 108,
"mapping": {
"A1": "cell_name_1",
"A2": "cell_name_20",
"A3": "cell_name_5",
},
"name": "OD1599_NU_1206_Pos2",
"outliers": [],
"start": 15
}
Afterwards, hit the "Restart + Run" button on the toolbar which looks like a double forward arrow:
You might be prompted to clear variables; just type 'y' to start a clean run. While the script runs, you will see outputs for potential errors as well as steps. If you wish to ignore the errors, then that's fine, but some will break the code. After the code is completed at Step 8, you will see a Output location:
filepath for where the model as been placed. You can then stick this into MIPAV to generate a model. For more information about the config.json
and cell_key.json
files, read below.
Location: config.json
Purpose: This file is responsible for the settings of the entire model.
Usage: The first layer of this json file contains a settings
and a data
section. The settings
section contains all of the neccesary default information (more on this later) to generate the model. Within settings
there are additional sections:
The folderpaths
section is used for importing data and contains the default MIPAV folder structure to be used for each strain's position. The "#" in data_folderpath
will be replaced by a volume number. You do not need to worry about what that number is, but make sure that "#" is in there somewhere. All of the other folders below that use the data_folderpath
as the root directory for that volume.
"folderpaths": {
"side": "RegB",
"data_folderpath": "Decon_reg_#\\Decon_reg_#_results",
"straightened_seam_cells": "straightened_seamcells\\straightened_seamcells.csv",
"straightened_annotations": "straightened_annotations\\straightened_annotations.csv",
"straightened_lattice": "straightened_lattice\\straightened_lattice.csv",
"twisted_seam_cells": "seam_cell_final\\seam_cells.csv",
"twisted_annotations": "integrated_annotation\\annotations.csv",
"twisted_lattice": "lattice_final\\lattice.csv"
},
The outlier_removal
section is part of Step 2. remove_outliers
is simply a flag for whether or not you would like to try and remove outliers for this step. If it is true, then outliers will be removed using a Hampel filter according to the window size window_size
and standard deviation n_stdev
.
"outlier_removal": {
"remove_outliers": true,
"n_stdev": 5,
"window_size": 5
},
The interpolation
section is part of Step 3. The total_min
parameter defines how many minutes the model should be interpolated to. The min_timepoints_required
is an error flag that will show and log an error if there are less than this value; this is important because interpolation requires a certain number of points to work. The method
defines the method of interpolation which can take any of the options shown in kind
parameter of interp1d: linear
, nearest
, zero
, slinear
, quadratic
, cubic
, previous
, next
). The seam_cells_on
section is a mapping of when each seam cell turns on from 0 (model start) to 1 (model end). Currently, most of the cells turn on the at the beginning, but the Q-cells turn on approximately two-thirds of the way through.
"interpolation": {
"total_min": 420,
"min_timepoints_required": 3,
"method": "linear",
"seam_cells_on": {
"H0L": 0,
"H0R": 0,
"H1L": 0,
"H1R": 0,
"H2L": 0,
"H2R": 0,
"V1R": 0,
"V1L": 0,
"V2R": 0,
"V2L": 0,
"V3R": 0,
"V3L": 0,
"V4L": 0,
"V4R": 0,
"QL": 0.667,
"QR": 0.667,
"V5L": 0,
"V5R": 0,
"V6L": 0,
"V6R": 0,
"TL": 0,
"TR": 0
}
}
The warping
section, which is part of Step 5, currently only contains a list of seam_cells
which are used to actually warp all the points to the warping model. The warp is performed using a thin-plate spline (more on this later).
"warping": {
"seam_cells": [
"H0L",
"H0R",
"H1L",
"H1R",
"H2L",
"H2R",
"V1R",
"V1L",
"V2R",
"V2L",
"V3R",
"V3L",
"V4L",
"V4R",
"V5L",
"V5R",
"V6L",
"V6R",
"TL",
"TR"
]
}
The smoothing
section is part of Step 4 which generates the warping model and Step 7 which does a moving average of all the annotation points. The window_size
parameter (not to be confused with the filter window in outlier_removal
) defines the window size for the moving average. The moving average truncates the windows at endpoints and ignores 0 values (more on this later).
"smoothing": {
"window_size": 20
}
The mipav_output
section is a part of Step 8, which takes the model and converts from dict/json format to csv format so that MIPAV can generate the appropriate animation. The labels_on
parameter denotes whether or not labels should be present in the MIPAV animation. It takes a boolean value, true
or false
, case sensitive. The cell_info
parameter takes a file with respect to the project directory that contains information about a cell's type and color in the animation. More on this in the cell_info section
"mipav_output": {
"labels_on": true,
"cell_info": "cell_info.json"
}
The data section contains additional sections as well: seam_cells
and strains
. The seam_cells
section is a list of folder paths/positions to use to create the warping model in Step 4.
"seam_cells": [
"Y:\\-\\Cell Tracking Project\\JCC596_NU\\091119_Pos3\\Decon_registered",
"Y:\\-\\Cell Tracking Project\\JCC596_NU\\091119_Pos2\\Decon_registered",
"Y:\\-\\Cell Tracking Project\\JCC596_NU\\082619_Pos3\\Decon_registered"
]
The strains
section contains a list of information regarding the strains. Each strain contains the name
which you define, include
which is whether or not you want to include it in the run sparing you from retyping the information, and folderpaths
which is a list of the single worm data. Here, I've only listed two strains. Also note that you are not limited to 3 positions per strain.
"strains": [
{
"name": "OD1599_NU",
"include": true,
"folderpaths": [
"Y:\\-\\Cell Tracking Project\\OD1599_NU\\OD1599_MostRecent\\120619_Pos2\\Decon_reg",
"Y:\\-\\Cell Tracking Project\\OD1599_NU\\OD1599_MostRecent\\112719_Pos3\\Decon_Reg",
"Y:\\-\\Cell Tracking Project\\OD1599_NU\\OD1599_MostRecent\\112619_Pos0\\Decon_reg"
]
},
{
"name": "DCR6485_RPM1_NU",
"include": true,
"folderpaths": [
"Y:\\-\\Cell Tracking Project\\DCR6485_RPM1_NU\\011419_Pos0\\Decon_reg",
"Y:\\-\\Cell Tracking Project\\DCR6485_RPM1_NU\\011419_Pos4\\Decon_reg",
"Y:\\-\\Cell Tracking Project\\DCR6485_RPM1_NU\\021020_Pos2\\Decon_Reg"
]
}
]
Location: This file should be located in whatever folder that contains your Reg A/Reg B folders defined in the config.json
file in the root directory. It should be named cell_key.json
. Currently, if you have a CellKey.xlsx
file, the program will automatically generate a json file for you as long as it follows the older format. After this is done, you should go back and check that this information is correct.
Purpose: This file defines the start
and end
times of the specific worm in a strain as well as mapping IDs from MIPAV to actual cell names. You can also override defaults here (more on this later).
Usage: As stated previously, you can define the start
and end
volumes here as integers. The outliers
parameter is a list of volumes to ignore (not currently used), the name
is any name you can give the specific worm, and the mapping
is a dictionary containing an ID to cell name association. Both the key and value in mapping
should be strings.
{
"end": 108,
"mapping": {
"A10": "RMDVR",
"A14": "ASHL",
"A15": "RIBL",
"A5": "OLQVL",
"A6": "SMDVL",
"C3": "RIGL",
"D2": "CEPshVL",
"D3": "URAVL",
"D4": "Hyp6"
},
"name": "OD1599_NU_1206_Pos2",
"outliers": [],
"start": 15
}
Because MIPAV has changed over the course of this project, we need a way to account for different file structures. If you include a folderpaths
section like from the config file here, it will read in data according to this structure instead.
{
"folderpaths": {
"side": "RegB",
"data_folderpath": "Decon_reg_#\\Decon_reg_#_results",
"straightened_seam_cells": "other_straightened_seamcells\\straightened_seamcells.csv",
"straightened_annotations": "other_straightened_annotations\\straightened_annotations.csv",
"straightened_lattice": "other_straightened_lattice\\straightened_lattice.csv",
"twisted_seam_cells": "other_seam_cell_final\\seam_cells.csv",
"twisted_annotations": "other_integrated_annotation\\annotations.csv",
"twisted_lattice": "other_lattice_final\\lattice.csv"
},
"end": 108,
"mapping": {
"A10": "RMDVR",
"A14": "ASHL",
"A15": "RIBL",
"A5": "OLQVL",
"A6": "SMDVL",
"C3": "RIGL",
"D2": "CEPshVL",
"D3": "URAVL",
"D4": "Hyp6"
},
"name": "OD1599_NU_1206_Pos2",
"outliers": [],
"start": 15
}
Location: cell_info.json
. Make sure that the settings -> model_output -> cell_info
parameter in config.json
points to this file. You can change the name as long as it's consistent across the two .json
files.
Purpose: The cell_info.json
file is used primarily in Step 8 to generate the visualization. It contains the cell type and colors at certain points in the worm's development.
Usage: An example entry is shown below. The first level is the cell itself. The next level contains colors
and type
. type
refers to the general cell type and the default parameters it will take if it is defined. However, colors
will always override type
attributes. The colors
is organized by the time between twitch and hatch, from 0-1, and the respective color as an RGB list at that time. If there is only one color, it is used throughout the entire model, but if there are multiple, then there will be linear interpolation of each channel.
"adeshl": {
"colors": {
"0.00000": [
255.0,
255.0,
255.0
]
},
"type": null
}
The generation of the model is broken down into 8 major steps and I'll be outlining them here.
First thing you should do is define the name of the run, maybe use something like the strain name (e.g. JCC596_NU
) or some identifier (e.g. NerveRingTest
). This is used to generated the workspace name so if it is not unique (which is not always a problem), the workspace will be overwritten. Next, the workspace is actually generated. You might notice that what displays is something like Workspace folderpath: workspace\2020_08_16-JCC596_NU
where the date is tagged onto the run name. This is to keep things more organized when we go back and look at models in the future. After, the config.json
file is loaded into the program and used throughout the run.
This goes through each of the folders from the config.json
defined in the data -> strains
section and loads it into a variable called compiled_data
. This variable contains each strain's worms as well as each worm's cell_key
, seam_cells
, annotations
, and errors
found while parsing (more on this below). The program uses a cell_keys.json
as defined above instead of a CellKey.xls
for ease of importing the information. However, if a CellKeys.xlsx
does not exist according to the get_cell_key
, then the program will try to automatically generate a cell_keys.json
based of of what's in the Excel file; this operation is performed by the convert_cell_key_csv2json
function. The output of this step is located in the workspace folder as 1_compiled_data.json
.
Errors: In this step, errors are determined within the get_cell_key
and parse_mipav_data
functions. Here is a list of checks that it currently does + prints/logs:
- If there is neither a
cell_key.json
orCellKeys.xlsx
file. - If there is an empty seam cell file.
- If there is an empty annotation file.
- If there is a mis-match between cells in the twisted and straightened seam cell files.
- If there is a mis-match between cells in the twisted and straightened annotation files.
- If there are extra seam cells according to what's in
config.json
settings -> interpolation -> seam_cells_on
- If there are identical IDs in the straightened annotation files.
- If it failed to read a file.
This goes through each of the worms in each strain and filters each axis according to the parameters defined in config.json
in the settings -> outlier_removal
section. Again, window_size
is the window size for filtering where any window that goes past the data is augmented in a reflection scheme. This was chosen because it was the most natural representation of the data past the endpoints. Additionally, n_stdev
is just the number of standard deviations past the median to be considered an outlier. If a point is deemed an outlier, it is replaced with the median of the window. The output of this step is located in the workspace folder as 2_compiled_data_no_outliers.json
.
Errors: This step currently does not log any errors.
This goes through each of the worms in each strain and interpolates to a time in minutes based on the method defined in config.json
(settings -> interpolation -> total_min
and settings -> interpolation -> method
, respectively). It is recommended that linear
interpolation is used, but as stated previously, you have an option of choosing between linear
, nearest
, zero
, slinear
, quadratic
, cubic
, previous
, and next
. Seam cells and annotations are handled differently here because annotations are assumed to exist at the start of the run, but seam cells can turn on a different times. Using what's defined in config.json
's settings -> interpolation -> seam_cells_on
, the program either trucancates data that starts earlier than when it's supposed to. If there is less data than what is expected, then the program will simply start it after the designated start time defined in the config file. The output of this step is located in the workspace folder as 3_compiled_data_interpolation.json
.
Errors: In this step, errors are to ensure that interpolation is possible and/or meets the number of points required by user-designation in config.json
in the settings -> interpolation -> min_timepoints_required
where min_timepoints_required
refers to the number of raw volumes. It should be noted that the code will break here if the data does not allow for interpolation. Here is a list of checks that it currently does + prints/logs:
- If there are insufficient timepoints for interpolation.
This uses the combined data that has already been loaded in w/ outliers removed, interpolates according to parameters in Step 3, averages the cell positions across multiple worms, and then smooths the averaged positions using a moving average whose window is defined in config.json
in settings -> smoothing -> window_size
. This step currently ignores the Q-cells QL and QR. There are currently plans to load in outside models such as from the original Untwisting paper; this is still a work in progress. The output of this step is located in the workspace folder as 4_seam_cell_warping_model.json
.
Errors: This step currently does not log any errors.
This step warps all of the worms in each strain to the warping model created or loaded-in from Step 4. The code first reorders the cells so that each timepoint contains all of the cells necessary for warping. The function thin_plate_spline_warp
is imported from thin_plate_spline_warp.py
and is a Python analog of the original MATLAB script used in the previous model building code which can be found here. From there, the function takes in the original control point positions (warp_from
), the new control point positions (warp_to
), and the points to be warped from the original space (ordered_coord_list
). Because there is no identifier for which cell corresponds to which coordinate, the program simply uses a sorted list of the cell names. The output of this step is located in the workspace folder as 5_compiled_data_warped.json
.
Errors: This step currently does not log any errors.
After all of the worms in each strain have been warped, Step 6 averages all of the coordinates together by cell. This includes both seam cells and annotations, but because seam cells were the control points in warping, they are virtually the same as in Step 4 even when averaged. The output of this step is located in the workspace folder as 6_cell_coordinates_by_timepoint.json
.
Errors: This step currently does not log any errors.
After all of the seam cells and annotations have been averaged, Step 7 smoothes via moving average for each cell by dimension according to the window defined in settings -> smoothing -> window_size
. The output of this step is located in the workspace folder as 7_cell_coordinates_by_timepoint_smoothed.json
. With this step, the model is now complete.
Errors: This step currently does not log any errors.
This step takes the model and simply outputs it into the output
folder in the workspace. The parameters used here are located in settings -> mipav_output -> labels_on
which takes a boolean value (true
or false
, case sensitive). The program will print out a filepath that can be used with the Untwising Plugin for MIPAV. Simply put this filepath in the "Data directory (marker 1)" field in the GUI and select "create annotation animation" in the Build/Edit section. This will create an animation
folder in the output
folder which can then be post-processed using another software like ImageJ.
Errors: This step currently does not log any errors.
After the model is generated, there are two ways to view it. One is to put the output of Step 8 into MIPAV, the alternative is to use the c_elegans_visualization.ipynb
file. To use the latter, similar to the model building script, open by running jupyter notebook c_elegans_visualization.ipynb
and it should pop up into your default browser. In the first code block, edit the model_name
to the corresponding model in the workspace. You can also edit the Step number to check if intermediates are correct, but normally it should be set to 7 which is the last step. When you hit restart + run
like before, the code will generate a visualizations
folder within the model workspace and use the cell_info.json
file (as a color reference) to generate several different plots (2D in different axes + 3D) like the one shown below.