cubar

Overview

cubar is a package for codon usage bias analysis in R. Main features are as follows:

Codon level analyses
- Calculate codon weights based on gene expression, tRNA availability, and mRNA stability;
- Calculate relative synonymous codon usage (RSCU);
- Machine learning-based inference of optimal codons;
- Visualization codon-anticodon pairing relationships;
Gene level analyses
- Tabulate codon frequency of each coding sequence;
- Measure codon usage similarity to highly expressed genes with Codon Adaptation Index (CAI);
- Quantify the influnce of codon usage on mRNA stability with Mean Codon Stabilization Coefficients (CSCg);
- Measure codon usage bias with the nonparametric index Effective number of codons (ENC);
- Measure the fraction of pre-determined optimal codons (Fop) in each sequence;
- Overall GC content (GC) or that of 3rd synonymous positions (GC3s) or 4-fold degenerate sites (GC4d);
- Quantify whether codon usage matches tRNA availability using tRNA Adaptation Index (tAI);
- Measure the deviation from porportionality (Dp) of viral synonymous codon usage from host tRNA supply;
Utilities
- Sliding window analysis of codon usage within a coding sequence;
- Optimize codon usage based on optimal codons for heterologous expression;
- Test differential usage of codons between two sets of sequences;

Main advantages of cubar are as follows:

Process large datasets (>10,0000 sequences) efficiently using the Biostrings and data.table backends;
Support genetic codes cataloged by NCBI as well as custom ones;
Integrate with other data analysis or bioinformatic packages in the R ecosystem;

Dependencies

Depends

R (>= 4.1.0)

Imports

Biostrings (>= 2.60.0),
IRanges (>= 2.34.0),
data.table (>= 1.14.0),
ggplot2 (>= 3.3.5),
rlang (>= 0.4.11)

Installation

The latest release of cubar can be installed with:

install.packages("cubar")

The latest developmental version of cubar can be installed with:

devtools::install_github("mt1022/cubar", dependencies = TRUE)

Usage

Documentation can be found within R (by typing ?function_name). The following tutorials are available from our website:

Get Started: A brief introduction demonstrating the basic usage of cubar;
Non-standard Genetic Code: How to use cubar with non-standard genetic codes;
Theories behind cubar: The mathematical details behind the core functions in cubar;

Getting help

Please use GitHub issues for bug reports, questions, and feature requests.

Suggests

Biostrings for sequence input/output and manipulation;
Peptides for peptide- or protein-related indices;

Acknowledgements

GitHub Copilot was used to suggest code snippets in the development of this package. Thanks the GitHub Education teacher program for providing free access to GitHub Copilot.