/smsk_selection

Snakemake project to predict orthogroups and find patterns of postive selection with orthofinder and fastcodeml

Primary LanguageC++MIT LicenseMIT

smsk_selection: A Snakemake pipeline to find orthologs and marks of positive selection

1. Description

This is a pipeline to (briefly described):

  1. Predict proteins from transcriptomes (transdecoder),
  2. Find orhogroups with OrthoFinder, and methods from Yang et al.
  3. Find patterns of positive selection with FastCodeML.
  4. Annotate transcripts with transdecoder / trinotate
  5. Assess transcriptome completeness with Busco

smsk_selection pipeline

2. First steps

  1. Install conda

  2. Install snakemake:

conda install --yes snakemake
  1. Clone this repo. In case of error with SSL certificates, add -c http.sslVerify=false
git clone --recursive https://github.com/jlanga/smsk_orthofinder.git
  1. Compile the necessary dependencies: phyx, guidance and fastcodeml:
bash src/compile_deps.sh
  1. Introduce the paths to your samples in samples.tsv.

  2. Run the pipeline as is:

snakemake --use-conda --jobs

or run it inside a Docker container:

bash src/docker_run.sh -j 4 

3. File organization

The hierarchy of the folder is the one described in A Quick Guide to Organizing Computational Biology Projects:

smsk_selection
├── data: raw data, downloaded fastas, databases,....
├── README.md
├── Snakefile: Pipeline runner
├── results: processed data.
|   ├── busco: SCOs identified
|   ├── cdhit: clustered transcriptome
|   ├── homologs: clustered orthogroups as in Yang et al.
|   ├── orthofinder: clustered orthogroups by orthofinder
|   ├── selection: alignments and positive selection results
|   ├── transcriptome: links to input transcriptomes
|   ├── transdecoder: predicted CDS
|   ├── tree: ML and bayesian species tree from 4fold degenerate sites
|   └── trinotate: transcriptome annotation
└── src: additional source code, tarballs, snakefiles, etc.

4. Requirements

To run this pipeline it should be only necessary to have snakemake and conda / mamba. They together are able to download the required packages to run each step.

In case of doubt, the Dockerfile contains the list of the required packages to install.

Bibliography