/fastTopics

Fast algorithms for fitting topic models and non-negative matrix factorizations to count data.

Primary LanguageROtherNOASSERTION

fastTopics

R-CMD-check CircleCI codecov

fastTopics is an R package implementing fast, scalable optimization algorithms for fitting topic models and non-negative matrix factorizations to count data. The methods exploit the close relationship between the topic model and Poisson non-negative matrix factorization. The package also provides tools to compare, annotate and visualize model fits, including functions to create "structure plots" and functions to identify distinctive features of topics. The fastTopics package is a successor to the CountClust package.

If you find a bug, or you have a question or feedback on this software, please post an issue.

Citing this work

If you find the fastTopics package or any of the source code in this repository useful for your work, please cite:

K. K. Dey, C. J. Hsiao and M. Stephens (2017). Visualizing the structure of RNA-seq expression data using grade of membership models. PLoS Genetics 13, e1006599.

P. Carbonetto, A. Sarkar, Z. Wang and M. Stephens (2021). Non-negative matrix factorization algorithms greatly improve topic model fits. arXiv 2105.13440.

If you used the de_analysis function in fastTopics, please cite:

P. Carbonetto, K. Luo, A. Sarkar, A. Hung, K. Tayeb, S. Pott and M. Stephens (2023). GoM DE: interpreting structure in sequence count data with differential expression analysis allowing for grades of membership. Genome Biology 24, 236.

License

Copyright (c) 2019-2023, Peter Carbonetto and Matthew Stephens.

All source code and software in this repository are made available under the terms of the MIT license.

Quick Start

Install and load the package from CRAN:

install.packages("fastTopics")
library(fastTopics)

Alternatively, install the latest version from GitHub:

remotes::install_github("stephenslab/fastTopics")
library(fastTopics)

Note that installing the package will require a C++ compiler setup that is appropriate for the version of R installed on your computer. For details, refer to the documentation on the CRAN website.

For guidance on using fastTopics to analyze gene expression data, see the single-cell RNA-seq vignette, part 1 and part 2.

Also, try running the small example that illustrates the fast model fitting algorithms:

example("fit_poisson_nmf")

See the package documentation for more information.

Developer notes

To prepare the package for CRAN, remove both single-cell vignettes, then run R CMD build fastTopics to build the source package.

This is the command used to check the package before submitting to CRAN:

library(rhub)
check_for_cran(".",show_status = TRUE,
  env_vars = c(`_R_CHECK_FORCE_SUGGESTS_` = "false",
               `_R_CHECK_CRAN_INCOMING_USE_ASPELL_` = "true"))

Credits

The fastTopics R package was developed by Peter Carbonetto, Matthew Stephens and others.