prophysaon_transcriptomics

Phylogenetics

Concatenate all CDS sequences into all.fa for building local blast database.
```
cat folder/*.fa > all.fa
```
NOTE: Change folder to the name of the folder with your input files (folder)
BLAST
```
sbatch bash_scripts/allbyall.sh
```
blast to mcl (format output for running mcl)
```
sbatch bash_scripts/blast_to_mcl.sh
```
mcl (run mcl)
```
sbatch bash_scripts/mcl.sh
```
MCL to fasta
```
sbatch bash_scripts/mcl_to_fasta.sh
```
Alignment with GUIDANCE2 in conjunction with MAFFT
```
sbatch bash_scripts/guidance_master.sh
```
NOTE: This script submits lots of bash jobs. You'll need some other files (see slurm script for details).
Make sure all alignments finished
```
bash bash_scripts/check_alignments.sh
```
NOTE: If names are printed, then those alignments were not completed, and you'll need to rerun them. This shouldn't happen unless there is an issue with the HPC, but since the previous script submits so many jobs, it can be hard to tell what failed.
TrimAI
```
sbatch bash_scripts/trimal.sh
```
Keep only those alignments greater than 200 bp
```
bash keeplongalignments.sh
```
Filter alignments by removing individuals with more than 50% gaps and alignmetns with < 4 individuals
```
python python_scripts/filteralignments.py
```

Infer Gene Trees

sbatch ./bash_scripts/gene_trees.sh
cd gene_trees
mkdir trees
mv *.treefile trees

Rename gene trees and check for issues

NOTE: THIS WILL NOT WORK FOR GENERIC INPUT. It is fairly specific to my application, but you may need to do similar things for downstream inferences to work correctly. For downstream scripts to work, taxa must be named as follows: The first part of the name should be the same for any gene sequences sampled from that taxon, and it should be seperated from everything else by an '@'. This script will almost certainly not accomplish that for your data.
```
python ./python_scripts/check_trees.py
```

Prep Astral-Pro input

mkdir astral_pro
python ./python_scripts/apro_mapping.py --input ./gene_trees/renamed_trees/ --output astral_pro

Run ASTRAL Pro
```
sbatch ./bash_scripts/apro.sh
```

Prep Astral input

mkdir -p astral/all
python ./python_scripts/astral_mapping.py --input ./gene_trees/renamed_trees/ --output astral/all

Run ASTRAL
```
sbatch ./bash_scripts/astral.sh
```

ASTRAL-DISCO

mkdir disco
sbatch bash_scripts/disco.sh
sbatch bash_scripts/astral-disco.sh

CA-DISCO

mkdir cadisco
sbatch bash_scripts/cadisco.sh
python ./python_scripts/prep_cadisco.py --input ./gene_trees/renamed_trees/ --output ./cadisco/ --alignments ./alignments_g200bp_filtered/
python /N/u/mls16/Carbonate/Programs/DISCO/ca_disco.py -i ./cadisco/all_trees.tre -a ./cadisco/all_alignments.txt -t ./cadisco/all_taxa.txt -o ./cadisco/decomposed.fa -d @

IQTree on CA-DISCO
```
sbatch ./bash_scripts/ca_iqtree.sh
```
Yang and Smith datasets creation
```
sbatch ./bash_scripts/ys.sh
```

ASTRAL set up for YS datasets.

sbatch ./bash_scripts/astral_setup_ys.sh

Run ASTRAL

sbatch ./bash_scripts/astral_sco.sh
sbatch ./bash_scripts/astral_mi.sh
sbatch ./bash_scripts/astral_mo.sh

Concatenated datasets for YS datasets.

sbatch ./bash_scripts/concatenate_sco.sh
sbatch ./bash_scripts/concatenate_mi.sh
sbatch ./bash_scripts/concatenate_mo.sh

Run IQTree on YS datasets

mkdir iqtree_ys
sbatch ./bash_scripts/sco_iqtree.sh
sbatch ./bash_scripts/mi_iqtree.sh
sbatch ./bash_scripts/mo_iqtree.sh

meganlsmith/prophysaon_transcriptomics

prophysaon_transcriptomics

Phylogenetics