The goal of cogeqc
is to facilitate systematic quality checks on
standard comparative genomics analyses to help researchers detect issues
and select the most suitable parameters for each data set. Currently,
cogeqc can be used to assess:
-
Genome assembly and annotation quality: using two approaches:
-
Statistics in a context: users can extract summary assembly and annotation statistics for genomes on NCBI (via the NCBI Datasets API) and compare their observed values (e.g., genome size, number of genes, contiguity measures) with previously reported values on NCBI genomes.
-
Gene space completeness with BUSCOs: users can assess gene space completeness using Best Universal Single-Copy Orthologs (BUSCOs) through wrapper functions that run BUSCO from the comfort of an R session and create publication-ready plots with summary statistics.
-
-
Orthogroup inference: orthogroups are assessed based on the percentage of shared protein domains in all ortogroups. The rationale for this approach is that genes in the same orthogroup evolved from a common ancestor, so the percentage of conserved protein domains in an orthogroup should be as high as possible.
-
Synteny detection: synteny detection is assessed using network-based approaches, namely the clustering coefficient and degree of a synteny network.
Get the latest stable R
release from
CRAN. Then install cogeqc
using from
Bioconductor the following code:
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("cogeqc")
And the development version from GitHub with:
BiocManager::install("almeidasilvaf/cogeqc")
Below is the citation output from using citation('cogeqc')
in R.
Please run this yourself to check for any updates on how to cite
cogeqc.
print(citation('cogeqc'), bibtex = TRUE)
#>
#> To cite package 'cogeqc' in publications use:
#>
#> Almeida-Silva F, Van de Peer Y (2022). _cogeqc: Systematic quality
#> checks on comparative genomics analyses_. R package version 1.3.1,
#> <https://github.com/almeidasilvaf/cogeqc>.
#>
#> A BibTeX entry for LaTeX users is
#>
#> @Manual{,
#> title = {cogeqc: Systematic quality checks on comparative genomics analyses},
#> author = {Fabrício Almeida-Silva and Yves {Van de Peer}},
#> year = {2022},
#> note = {R package version 1.3.1},
#> url = {https://github.com/almeidasilvaf/cogeqc},
#> }
Please note that the cogeqc
was only made possible thanks to many
other R and bioinformatics software authors, which are cited either in
the vignettes and/or the paper(s) describing this package.
Please note that the cogeqc
project is released with a Contributor
Code of Conduct. By
contributing to this project, you agree to abide by its terms.
- Continuous code testing is possible thanks to GitHub actions through usethis, remotes, and rcmdcheck customized to use Bioconductor’s docker containers and BiocCheck.
- Code coverage assessment is possible thanks to codecov and covr.
- The documentation website is automatically updated thanks to pkgdown.
- The documentation is formatted thanks to devtools and roxygen2.
For more details, check the dev
directory.
This package was developed using biocthis.