/saiz-et-al_2020

Data and code for the article by Saiz et al deposited in biorXiv

Primary LanguageC++Creative Commons Attribution Share Alike 4.0 InternationalCC-BY-SA-4.0

Saiz et al. (2020)

This repository contains the data and code associated with the article by Saiz et al., Growth factor-mediated coupling between lineage size and cell fate choice underlies robustness of mammalian development, eLife 2020;9:e56079, DOI:10.7554/eLife/56079.

The repository tries to follow a standard structure, mostly based on the Cookiecutter Data Science project and other best practices recommendations. This repository was made post-hoc, in 2019, based on rawer code written between 2017 and 2019 (sorry - trying to be better).

Repository structure:

  • README.md - you're reading it
  • .Rprofile - contains global settings and options for RStudio.
  • regdev_project.Rproj - RStudio project. If open with RStudio in the main directory where all files are, it should allow to reproducibly run all scripts, analyses and create graphs. If you clone the repository, it should work out of the box.
  • /packrat - contains logs and package library for this project, as created with the packrat R package.
  • /data
    • /corfiles - manually curated files, corrected for over and undersegmentation, as described in Morgani et al., (2018) Dev Bio.
    • /interim - intermediate tables to be read in by other scripts. Files are generated by transformation scripts in /src.
    • /mined-data - data extracted from other studies:
      • Fgf4 allelic series from Kang et al (2013) Development, as well as data on Sox17 and Pdgfra allelic series from Artus et al (2011) and (2010).
      • Gata6 allelic series from Schrode et al (2014) Dev Cell.
      • Fgfr1; Fgfr2 allelic series from Kang et al (2017) Dev Cell.
      • Gata4 allelic series generated by myself (unpublished).
    • /moviefiles - tracking data for time lapse movies generated in this study. Each folder (or groups of folders, see naming) contains data files for one movie.
    • /processed - processed files where data has been transformed and cells classified, typically output of *_tx.R or *_classify.R scripts. Currently contains files generated by my testing of the code. You may empty the folder and generate them all from scratch.
    • /raw - unprocessed files resulting from running *_read.R scripts on corresponding /corfiles (combine corfiles and count cell numbers).
    • /uncorfiles - original data files generated by MINS segmentation. Currently empty. Will be updated in time.
  • /figures - empty container to store plots generated during analysis.
  • /results - contains tables and results generated in the course of the analysis.
  • /notebooks - contains interactive Jupyter notebooks detailing the main the data transformations
    • GFP-classification.ipynb demonstrates the automatic classification of GFP+ and GFP- populations in embyro aggregation chimeras (shown in Figures 1 and Supplementary Figure 2)
    • H-clustering.ipynb demonstrates the steps taken to classify ICM cells using Hierarchical Clustering
    • Z-correction.ipynb compares raw with transformed data after correcting for fluorescence decay along the Z-axis
    • nanogata-tx.ipynb describes the conversion of NANOG and GATA6 intensitiy levels obtained with different antibodies to a common scale
  • /references - contains the experimental reference files (metadata), generated manually as logs over the course of experimentation, as well as other reference files generated automatically during the analysis:
    • *_exp_ref.csv are files with experimental data for each experiment, such as experimental date, experimental group, etc
    • *_if.csv are files containing details about the antibodies used to stain the corresponding embryos in each experiment
  • /src - contains source code to use in the project
    • *_runall.R files run all steps in the pipeline for the corresponding dataset (read, transform, classify and count cells).
    • plotting-aes.R sets up a few objects that are used in plots throughout the project.
    • setup.R loads the basic packages used in all scripts, runs plotting-aes.R and sets a seed for reproducibility.
    • /data - scripts that read in and transform data tables
      • *_read.R read in files in ./data/corfiles, combine them, clean them and count cells. Generate *-raw.csv files stored in ./data/raw.
      • *_tx.R apply transformations to data generated by *_read.R. Generate *-tx.csv files stored in ./data/interim.
      • *_classify.R classify cells in *_tx.csv data tables into populations as necessary. Generate *-processed.csv files stored in ./data/processed.
      • *_counter.R count number of cells in each lineage in *-processed.csv data tables and writes them to ./data/processed.
    • /functions - scripts defining functions used throughout analysis to perform data transformations, classification, etc.
    • /models - scripts that fit models to transform some of the data.
    • /visualization - scripts to generate the plots in the figures of the paper. Output files are stored in ./figures.

Usage

Clone repository using git clone as usual. The file regdev_project.Rproj should allow you to open and run all scripts and functions and reproducibly generate data tables and figures. Packages are handled by packrat and the working directory by the .Rproj file. If you encounter any bug or issues please reach out.