tryptag
is a python module for accessing and handling TrypTag genome-wide protein localisation project data.
Its primary intended use is for easy access to image data for automated image analysis.
First, make sure python
, git
and pip
are installed, on linux/mac:
sudo apt install python3 python3-pip git
Or, on windows:
winget install Python.Python.3.0; winget install Git.Git
python3 -m pip install -update pip
Next, install tryptag
using pip
. This requires git
:
pip install git+https://github.com/zephyris/tryptag
tryptag
requires several python modules: numpy
scikit-image
, progressbar2
and filelock
. These are automatically installed when using pip
.
To reinstall and upgrade use pip
:
pip install --upgrade --force-reinstall git+https://github.com/zephyris/tryptag
To uninstall also use pip
:
pip uninstall tryptag
To use the tryptag
module, import the TrypTag
class and set up an instance (normally called tryptag
):
from tryptag import TrypTag
tryptag = TrypTag()
Microscopy data is multiple fields of view per cell line.
It can be accessed using instances of CellLine
, a simple class defining cell line life cycle stage, gene id (as used on TriTrypDB), tagging terminus (n
or c
). There are multiple fields of view, accessed by field_index:
from tryptag import CellLine
cell_line = CellLine("Tb927.9.8570", "n")
field_index = 2
field_image = tryptag.open_field(cell_line, field_index)
This returns an instance of a FieldImage
object, containing the phase, mNG, DNA stain, phase mask and DNA mask images.
The cells in the phase threshold image are indexed and can be opened individually. To open a specific cell in the field of view:
cell_index = 14
cell_image = tryptag.open_cell(cell_line, field_index, cell_index)
Similar to open_field
, open_cell
returns a CellImage
object.
Images within a FieldImage
or CellImage
object can be accessed using dot notation: field_image.phase
, .mng
, .dna
, .phase_mask
and .dna_mask
.
All images are numpy
ndarray
objects, as used by scikit-image
:
from skimage import io
io.imshow(cell_image.phase)
io.show()
Bear in mind that accessing a nonexistant gene id, tagging terminus, field or cell will give KeyError
errors. For example:
for field_index in range(7):
try:
field_image = tryptag.open_field(cell_line, field_index)
# do your analysis here
except:
print("Field not found, field_index:", field_index)
tryptag
understands the localisation annotation ontology and provides a tool for intelligent localisation searches. First setup tryptag
:
from tryptag import TrypTag
tryptag = TrypTag()
You can search by any of the localisation annotation terms:
results = tryptag.localisation_search("nucleoplasm")
This returns a list of CellLine
objects including gene id and terminus, which you can access using dot notation: cell_line.gene_id
, .terminus
.
The primary intended use of the tryptag
module is for easy access of specific field of view and cell images for automated image analysis. See quickstart.
open_field
and open_cell
return a FieldImage
or CellImage
object containing these images which can be accessed using the appropriate dot notation. Microscopy data is in three image channels and two thresholded images:
Image channels:
.phase
Phase contrast (transmitted light, overall cell morphology) uint16.mng
mNG fluorescence (green fluorescence, from the tagged protein) uint32.dna
DNA stain fluorescence (blue fluorescence, using Hoechst 33342) uint16
Thresholded images:
.phase_mask
Thresholded phase contrast (cells) uint8, 255 = object, current cell of interest.dna_mask
Thresholded DNA stain (nuclei and kinetoplasts - mitochondrial DNA organelles) uint8, 255 = object, kinetoplasts or nuclei
CellImage
objects additionally contain .phase_mask_othercells
which is a mask of every other cell in the view.
tryptag
includes tryptools
which provides some useful tools for image analysis of Trypanosoma brucei cells. First import TrypTag
and tryptools
and set up tryptag
.
from tryptag import TrypTag, tryptools
tryptag = TrypTag()
The tryptools
methods take a CellImage
object as an input and return various automated image analysis data.
cell_image = tryptag.open_cell(CellLine(life_stage="procyclic", gene_id="Tb927.9.8570", terminus="n"))
morphology_result = tryptools.cell_morphology_analysis(cell_image)
tryptag
makes it easy to apply an analysis to many cell lines. First import and set up tryptag
:
from tryptag import TrypTag, tryptools
tryptag = TrypTag()
Define your analysis function you'd like to apply to each cell line. This example analyses mNG signal in each individual cell:
def analysis_function(tryptag, cell_line):
result = []
fieldcell_list = tryptag.cell_list(cell_line)
for entry in fieldcell_list:
tryptools.cell_signal_analysis(tryptag.open_cell(cell_line, entry["field_index"], entry["cell_index"]))
return result
Run the analysis using the built-in multiprocess analysis tool. This example applies this function to all cell lines. Automated iteration through the entire ~5,000,000 cell dataset:
worklist = tryptag.worklist_all()
results = tryptag.analyse_list(worklist, analysis_function)
You can also use the output of tryptag.localisation_search
for worklist
, or tryptag.worklist_parental
for data from untagged parental cells.
You can use tryptag
to download the microscopy data. In python, import the module and set up a TrypTag
instance:
from tryptag import TrypTag, CellLine
tryptag = TrypTag()
You can trigger download of microscopy data for a specific gene id and tagging terminus using fetch_data
.
This looks up in which tagging plate correspond to the most recent replicate of this life cycle stage, gene ID and terminus tagging attempt, and the URL at which to find this data.
It then downloads and decompresses the data to the data_cache_path
directory.
tryptag.fetch_data(CellLine(life_stage="procyclic", gene_id="Tb927.7.1920", terminus="n"))
This will take a long time; to get image data for a single gene the data for an entire plate needs to be downloaded. This is typically ~10 to 20 Gb.
Look through the data cache directory and you will find the microscopy data, in one subdirectory per tagging plate and named by gene id and tagging terminus.
Make sure you have enough free disk space to download, decompress and cache the image data. This is up to ~40 Gb for a single plate and ~4 Tb for the entire dataset.
The default cache location is _tryptag_data
within the current working directory. You can change this to a relative or absolute path to any directory you wish - we recommend a scratch drive with sufficient space.
Make sure this is set at the start of every script:
Linux/Mac:
from tryptag import TrypTag
tryptag = TrypTag(data_cache_path = "\mnt\z\my\scratch\directory")
or Windows:
tryptag = TrypTag(data_cache_path = "Z:/my/scratch/directory")
Do not delete or move files from data_cache_path
. tryptag
does not check the plate subdirectories for integrity. You can, however, safely delete a plate subdirectory.
Interrupt of either data download or zip decompression should behave gracefully, leaving partial data but not preventing later automatic re-download and/or re-decompression.
If multiple scripts using the same data_cache_path
simultaneously try to download a plate it should be handled gracefully.
One script should download and decompress the image data, while the others (silently) wait until it the image data is available.
However, for large-scale analyses, it is more robust to ensure all data is already cached. You can easily download all image data (this will probably take more than one week!):
tryptag.fetch_all_data()
The TrypTag data may have minor errors which will be corrected over time. fetch_all_data
always fetches the latest localisation listing from Zenodo.
Cached image data may be an older version. tryptag
records the MD5 hash of the source zip files. If the data source (Zenodo depositions) are updated, the MD5 hash will change.
Cached data inconsistent MD5 hashes can be checked and reported (but currently not corrected) using:
tryptag.check_data_cache_integrity()
tryptag
gives quite verbose information about what it is currently doing to fetch data. To silence this output:
from tryptag import TrypTag
tryptag = TrypTag(verbose=False)
Internally, most tryptag data is held in a dict of dicts variable called gene_list
, which you can explore and access directly for advanced usage.
This gets populated with information about number of fields of view, cell locations, etc. when a method like cell_list
, open_cell
or open_field
requests microscopy data.
from tryptag import TrypTag, CellLine
tryptag = TrypTag()
life_stage, gene_id, terminus = "procyclic", "Tb927.9.8570", "n"
localisation = tryptag.gene_list[life_stage][gene_id][terminus]["loc"]
tryptag.cell_list(CellLine(life_stage=life_stage, gene_id=gene_id, terminus=terminus))
cell_information = tryptag.gene_list[life_stage][gene_id][terminus]["cells"]
If you use the TrypTag data resource, please cite Billington et al. 2023 Nature Microbiology doi:10.1038/s41564-022-01295-6. We recommend including this citation in the results or methods if TrypTag was used as part of a discovery process. If directly using TrypTag images, please also indicate in the figure legend or similar which images are from TrypTag.
If you use the tryptag
module to access or analyse TrypTag data, please also cite this Github repository and the master TrypTag Zenodo deposition doi:10.5281/zenodo.6862289.
You may also find the following papers of use:
- Dean et al. 2019 Trends Parasitol. doi:10.1016/j.pt.2016.10.009 Project announcement, with original aims and experimental strategy.
- Halliday et al. 2019 Mol. Biochem. Parasitol. doi:10.1016/j.molbiopara.2018.12.003 Describes the localisation ontology, with landmark protein examples.