Replaying-the-evolutionary-tape-to-investigate-subgenome-dominance: A Python repository from niederhuth

Title	Authors	Raw data
Replaying the evolutionary tape to investigate subgenome dominance in allopolyploid Brassica napus	Kevin A. Bird, Chad E. Niederhuth, Shujun Ou, Malia Gehan, J. Chris Pires, Zhiyong Xiong, Robert VanBuren Patrick P. Edger	SRA Accession: PRJNA577908

This repository is for DNA methylation scripts and data for the paper: https://nph.onlinelibrary.wiley.com/doi/full/10.1111/nph.17137

Please cite this paper if you use any of the resources here.

All analyses performed on the Michigan State University High Performance Computing Cluster (HPCC)

To reproduce the analysis, follow these steps:

NOTE #1: This analysis assumes you will be using Anaconda and I have provided a yml file to easily create an environment for repeating analyses.

1) Clone this git repository

git clone https://github.com/niederhuth/Replaying-the-evolutionary-tape-to-investigate-subgenome-dominance  
cd Replaying-the-evolutionary-tape-to-investigate-subgenome-dominance

2) Create the conda environment

conda env create -f scripts/Bnapus-polyploidy.yml

3) You will now need to create a symbolic link within this environment for methylpy to work. This will require you to cd into the environment located in your anaconda (or miniconda) directory.

cd miniconda3/env/Bnapus-polyploidy/lib  
ln -s libgsl.so.23.0.0 libgsl.so.0

4) Return to the cloned git repository

cd ~/Replaying-the-evolutionary-tape-to-investigate-subgenome-dominance

5) Create data and sample folders

mkdir data
cd data
for i in $(sed '1d' ../misc/samples.csv| cut -d ',' -f1)
do
	mkdir $i $i/job_reports
done

7) Setup reference genome for mapping

mv ../ref ./
cd ref
mkdir combined
zcat R500/*.fa.gz TO1000/*.fa.gz 37_Mitochondria.fa.gz 37_Plastid.fa.gz ../../misc/ChrL.fa.gz > tmp
python ../../scripts/py/fix_fasta.py -i tmp -o combined/combined.fa
rm tmp
cd combined
bash ../../../scripts/sh/index.sh
cd ../annotations
for i in *
do
	gunzip $i
done
cd bias
for i in *
do
	gunzip $i
done
cd ../../../

7) Run the setup.sh script (see note #2)

cd <sample_directory>
bash ../../scripts/sh/setup.sh

For the 'mock' sample you will need to combine the sequencing data from the TO1000 & IMB218 samples and add these to the fastq directory. The intention here is to imitate what would happen if you simply "combined" the two methylomes of these species. Since the TO1000 had a lot more reads than IMB218, I recommend downsampling to an equivalent number of reads. The overall impact on the metaplots is pretty minimal though To do this:

cd mock
bash ../../scripts/sh/subsample.sh

8) For each sample, run methylpy

cd <sample_directory>  
bash ../../scripts/sh/run_methylpy.sh

or submit as a job

9) When methylpy is finished, for each sample, you can run the various scripts in the "sh" directory (see note #2)

cd **<sample_directory>**  
bash ../../scripts/_sh/**<script_name>**

or submit as a job

These will create a series of outputs for each analysis in <sample_name>/combined/results

These should match data found in the figures_tables directory

10) You can run the methylation_analysis.R in the figures_tables directory to output plots and other analyzed results

cd figures_tables  
Rscript ../scripts/R/old_metaplots.R
Rscript ../scripts/R/new_metaplots.R 
Rscript ../scripts/R/multiplot_figures.R

NOTE #2: These scripts were written for use on the MSU HPCC. To run them on your computer or a different environment, you will need to change the header of each shell script (python and R scripts do not require changing) to something appropriate for your system. In each shell script, you will also need to either modify or delete these lines to a path appropriate for your system:

export PATH="$HOME/miniconda3/envs/Bnapus-polyploidy/bin:$PATH" 
export LD_LIBRARY_PATH="$HOME/miniconda3/envs/Bnapus-polyploidy/lib:$LD_LIBRARY_PATH"

Specifically modify $HOME/miniconda3 to correspond to where your conda installation is.

You may also need to delete this line, which is meant for submitted jobs on the cluster.

cd $PBS_O_WORKDIR

NOTE #3: The file "scripts/py/functions.py" contains a number of various python functions that I have written that can be used to analyze data from methylpy.