a Snakemake workflow for exploring the panregulome of grasses with model species Brachypodium distachyon
Snakemake
assembly-stats
BUSCO
R
get_homologues
mafft
git clone https://github.com/ammarabdalrahem/panregulome-analysis.git
pip3 install snakemake
input your username and password in SnakeFile to login JGI
snakemake --cores all
The main rule that generates all the output files.
Creates the necessary output directories.
Downloads genome sequence data using the provided script and file list.
Unzips and organizes the downloaded data.
Performs statistical analysis on the assembly files and creates an assessment table.
Runs BUSCO analysis on each assembly file and generates short summary files.
Extracts the BUSCO completeness scores from the summary files and creates a quality table.
Generates a boxplot to visualize the quality of the assemblies.
Identifies and excludes poor quality samples based on the boxplot results.
Downloads and installs the necessary tools for repeat masking and annotation.
Computes the total length of repeats for each ecotype and merges it with the total genome length.
Generates a plot to visualize the distribution of repeat lengths.
Performs TE annotation analysis by extracting TE families and their lengths from the annotation files.
Generates a plot to visualize the distribution of TE orders.
Extracts proximal promoter sequences from the genome files.
Downloads and installs the get_homologues tool.
Performs gene clustering using the get_homologues tool.
Extracts the headers of the gene clusters.
Performs clustering of the promoter sequences.
Performs global sequence alignment on the gene clusters and promoter sequences.
Performs local sequence alignment on the gene clusters and promoter sequences.
This rule trims the alignment files using the Trimal software.