FABBIT is a powerful bioinformatics tool designed for fast core genome alignment based on bidirectional best hit analysis. It's particularly useful for comparative genomics studies, allowing researchers to identify and analyze core genes across multiple genomes efficiently.
- 🧬 Predicts Open Reading Frames (ORFs) using Pyrodigal
- 💎 Performs Bidirectional Best Hit (BBH) analysis using DIAMOND
- 🔍 Identifies core genes across multiple genomes
- 🧩 Aligns core genes using MAFFT
- 🧮 Calculates Average Amino Acid Identity (AAI)
- 🔗 Generates concatenated core genome alignments
- 🎭 Filters genes based on Shannon entropy
- ⚡ Provides parallel processing capabilities for improved performance
You can install Fabbit directly from GitHub using pip:
pip install git+https://github.com/EnzoAndree/FABBIT.git
After installation, you can use Fabbit from the command line:
fabbit -i input1.fasta input2.fasta -o output_directory -t 4
-i
,--input
: Path to the input FASTA file(s) (required, multiple files allowed)-o
,--output
: Path to the output directory (required)-t
,--threads
: Number of threads to use (default: 1)-r
,--reference
: Path to reference genome file (optional)--sensitivity
: DIAMOND sensitivity mode (default: "fast")--evalue
: Maximum e-value to report alignments (default: 1e-6)--query-cover
: Minimum query cover percentage (default: 95)--max-target-seqs
: Maximum number of target sequences per query (default: 25)--id
: Minimum identity percentage (default: 30)--core-threshold
: Threshold for defining core genes (default: 95)-v
,--verbose
: Verbose level: 1=ERROR, 2=WARNING, 3=INFO, 4=DEBUG (default: 2)-V
,--version
: Show program's version number and exit
FABBIT generates the following main output files:
core_genome_all.aln
: Concatenated alignment of all core genescore_genome_filtered.aln
: Concatenated alignment of core genes after entropy filteringAAI_table.csv
: Summary of Average Amino acid Identity (AAI) resultscore_genes_matrix.csv
: Matrix of core genes across genomesentropy_distribution.png
: Plot of gene entropy distributiongene_entropies.csv
: CSV file containing entropy values for each genepartition_all.txt
: Partition file for all core genespartition_filtered.txt
: Partition file for filtered core genes
The script will create the following subdirectories in the specified output directory:
pyrodigal_orfs
: Contains predicted ORFscore_genome_genes
: Contains individual core gene alignmentslogs
: Contains log files
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
If you encounter any issues, please check the log file in the logs
directory for detailed error messages. For further assistance, please open an issue on the GitHub repository.
FABBIT makes use of several open-source tools and libraries. We thank the developers of DIAMOND, MAFFT, Pyrodigal, and other dependencies for their valuable contributions to the scientific community.