🧬 FABBIT: FAst coregenome alignment Based on Bidirectional best hIT 🚀

🌟 Overview

FABBIT is a powerful bioinformatics tool designed for fast core genome alignment based on bidirectional best hit analysis. It's particularly useful for comparative genomics studies, allowing researchers to identify and analyze core genes across multiple genomes efficiently.

🎉 Features

🧬 Predicts Open Reading Frames (ORFs) using Pyrodigal
💎 Performs Bidirectional Best Hit (BBH) analysis using DIAMOND
🔍 Identifies core genes across multiple genomes
🧩 Aligns core genes using MAFFT
🧮 Calculates Average Amino Acid Identity (AAI)
🔗 Generates concatenated core genome alignments
🎭 Filters genes based on Shannon entropy
⚡ Provides parallel processing capabilities for improved performance

🛠️ Installation

You can install Fabbit directly from GitHub using pip:

pip install git+https://github.com/EnzoAndree/FABBIT.git

🚀 Usage

After installation, you can use Fabbit from the command line:

fabbit -i input1.fasta input2.fasta -o output_directory -t 4

🎛️ Arguments

-i, --input: Path to the input FASTA file(s) (required, multiple files allowed)
-o, --output: Path to the output directory (required)
-t, --threads: Number of threads to use (default: 1)
-r, --reference: Path to reference genome file (optional)
--sensitivity: DIAMOND sensitivity mode (default: "fast")
--evalue: Maximum e-value to report alignments (default: 1e-6)
--query-cover: Minimum query cover percentage (default: 95)
--max-target-seqs: Maximum number of target sequences per query (default: 25)
--id: Minimum identity percentage (default: 30)
--core-threshold: Threshold for defining core genes (default: 95)
-v, --verbose: Verbose level: 1=ERROR, 2=WARNING, 3=INFO, 4=DEBUG (default: 2)
-V, --version: Show program's version number and exit

📊 Output

FABBIT generates the following main output files:

core_genome_all.aln: Concatenated alignment of all core genes
core_genome_filtered.aln: Concatenated alignment of core genes after entropy filtering
AAI_table.csv: Summary of Average Amino acid Identity (AAI) results
core_genes_matrix.csv: Matrix of core genes across genomes
entropy_distribution.png: Plot of gene entropy distribution
gene_entropies.csv: CSV file containing entropy values for each gene
partition_all.txt: Partition file for all core genes
partition_filtered.txt: Partition file for filtered core genes

📁 Directory Structure

The script will create the following subdirectories in the specified output directory:

pyrodigal_orfs: Contains predicted ORFs
core_genome_genes: Contains individual core gene alignments
logs: Contains log files

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

🐛 Troubleshooting

If you encounter any issues, please check the log file in the logs directory for detailed error messages. For further assistance, please open an issue on the GitHub repository.

🙏 Acknowledgements

FABBIT makes use of several open-source tools and libraries. We thank the developers of DIAMOND, MAFFT, Pyrodigal, and other dependencies for their valuable contributions to the scientific community.