This package produces AnVIL workspaces from R packages. Use this package
to create or update AnVIL workspaces from resources such as R /
Bioconductor packages. The metadata about the package (e.g., select
information from the package DESCRIPTION
file and from vignette YAML
headings) are used to populate the ‘DASHBOARD’ page on AnVIL. Vignettes
are translated to python notebooks ready for evaluation in AnVIL.
If necessary, install the AnVILPublish library
if (!"AnVILPublish" %in% rownames(installed.packages()))
BiocManager::install("AnVILPublish")
Note. The package currently works for Google Cloud Platform workspaces and does NOT support AnVIL workspaces that use the Azure platform.
There are only a small number of functions in the package; it is likely
best practice to invoke these using AnVILPublish::...()
rather than
attaching the package to the search path.
It is necessary to have the gcloud SDK available to copy notebook files to the workspace. Test availability with
AnVILGCP::gcloud_exists()
and verify that the account and project are appropriate (consistent with AnVIL credentials) for use with AnVIL
AnVILGCP::gcloud_account()
AnVILGCP::gcloud_project()
Note that these be used to set, as well as interrogate, the account and project.
Conversion of Rmarkdown (.Rmd
) or Quarto (.Qmd
) vignettes to Jupyter
(.ipynb
) notebooks uses Quarto software. It must
be available from within R, e.g.,
system2("quarto", "--version")
The user must determine if they want their vignettes converted or
rendered into Jupyter notebooks. The difference is that render
automatically executes R code blocks and embeds images, while
convert
will not.
Use of Python notedown for conversion is no longer supported.
CAUTION updating an existing workspace will replace existing content in a way that cannot be undone – you will lose content!
Workspace creation or update uses information from the DESCRIPTION file, CSV files in inst/tables, and from the YAML metadata at the top of vignettes. It is therefore worth-while to make sure this information is accurate.
In the DESCRIPTION file, the Title, Version, Date, Authors@R (preferred) or Author / Maintainer fields, Description, and License fields are used.
Tables in inst/tables must be CSV files. Individual entries in the CSV file may contain ‘whisker’ expressions for variable substitution, as follows:
{{ bucket }}
: the bucket location of the (possibly newly created) workspace, as returned byavbucket()
.
Tables are processed first with whisker.render()
for variable
substitution, and then readr::read_csv()
and avtable_import()
.
In vignettes, the title:
, author:
, and name:
fields are used. The
abstract is a good candidate for future inclusion.
The one-stop route is to create a workspace from the local package
source (e.g., GitHub checkout) directory using as_workspace()
.
AnVILPublish::as_workspace(
"path/to/package",
"bioconductor-rpci-anvil", # i.e., billing account
create = TRUE # use update = TRUE for an existing workspace
)
Use create = TRUE
to create a new workspace. Use update = TRUE
to
update (and potentially overwrite) an existing workspace. One of
create
and update
must be TRUE. The command illustrated above does
not specify the name =
argument, so creates or updates a workspace
"Bioconductor-Package-<pkgname>
, where <pkgname>
is the name of the
package read from the DESCRIPTION file; provide an explicit name to
create or update an arbitrary workspace. The option use_readme = TRUE
appends a README.md
file to the formatted content of the DESCRIPTION
file.
AnVILPublish::as_workspace()
invokes as_notebook()
so this step does
not need to be performed ‘by hand’.
See the command add_access()
, below, to make the workspace available
to a wider audience.
Some R resources, e.g., bookdown sites, are not in packages. These can be processed to workspaces with minor modifications.
-
Add a standard DESCRIPTION file (e.g.,
use_this::use_description()
) to the directory containing the.Rmd
files. -
Use the
Package:
field to provide a one-word identifier (e.g.,Package: Bioc2020CNV
) for your material. Add a key-value pairType: Workshop
or similar. ThePacakge:
andType:
fields will be used to create the workspace name as, in the example here,Bioconductor-Workshop-Bioc2020CNV
. -
Add a ‘yaml’ chunk to the top of each
.Rmd
file, if not already present, including the title and (optionally) name information, e.g.,--- title: "01. Introduction to the workshop" author: - name: Iman Author - name: Imanother Author ---
Publish the resources with
AnVILPublish::as_workspace(
"path/to/directory", # directory containing DESCRIPTION file
"bioconductor-rpci-anvil",
create = TRUE
)
These steps are performed automatically by as_workspace()
, but may be
useful when developing a new workspace or revising existing workspaces.
Transforming vignettes to notebooks may require several iterations, and
is available as a separate operation. Use update = FALSE
to create
local copies for preview.
AnVILPublish::as_notebook(
"paths/to/files.Rmd",
"bioconductor-rpci-anvil", # i.e., billing account
"Bioconductor-Package-Foo", # Workspace name
update = FALSE # make notebooks, but do not update workspace
)
The vignette transformation process has several limitations. Only .Rmd
vignettes are supported. Currently, the vignette is transformed first to
a markdown document using the rmarkdown
command
render(..., md_document())
. The markdown document is then translated
to Python notebook using quarto
.
It is likely that some of the limitations of vignette rendering can be reduced.
The "Bioconductor_User"
group can be added to the entities that can
see the workspace. AnVIL users wishing to view the workspace should be
added to the Bioconductor_User
group, rather than to the workspace
directly. To add the user group, use
AnVILPublish::add_access(
"bioconductor-rpci-anvil",
"Bioconductor-Package-Foo"
)
.Rmd
files need to be converted to jupyter notebooks. These ‘best
practices’ lead to results that are more likely to be satisfactory, as
outlined here.
-
For packages, make sure the DESCRIPTION file is complete. Use the
Authors@R
notation for fully specifying authors. Add aDate:
field indicating date of last modification. Follow other Bioconductor best practices, e.g., using and incrementing appropriate version numbers. -
For collections of vignettes not in a package (e.g., a bookdown folder), add a DESCRIPTION file at the top level. An example is
Package: BCC2020 Type: Workshop Title: R / Bioconductor in the AnVIL Cloud Version: 1.0.0 Authors@R: c(person( given = "Martin", family = "Morgan", role = c("aut", "cre"), email = "Martin.Morgan@RoswellPark.org", comment = c(ORCID = "0000-0002-5874-8148") ), person("Nitesh", "Turaga", role = "ctb"), person("Lori", "Shepherd", role = "ctb")) Description: This book contains material for a 2 1/2 hour course offered at the Bioinformatics Community Conference 2020. Bioconductor provides more than 1900 R packages for the analysis and comprehension of high-throughput genomic data. Most users install and run Bioconductor on a personal computer or perhaps use an academic cluster. Cloud-based solutions are increasing appealing, removing the headaches of local installation while providing access to (a) better, scalable computing resources; and (b) large-scale 'consortium' and other reference data sets. This session introduces the AnVIL cloud computing environment. We cover use of the cloud as a replacement to desktop-style computing; integrating workflows for 'upstream' processing of large data resources with interactive 'downstream' analysis and comprehension, using Human Cell Atlas single-cell datasets as an example; and querying cloud-based consortium data for integration with a users own data sets. License: CC-BY Date: 2020-07-17 Encoding: UTF-8 LazyData: true Roxygen: list(markdown = TRUE) RoxygenNote: 7.1.1
The
Type
andPackage
fields are used to construct the second and third elements of the workspace name (in this case,Bioconductor-Workshop-BCC2020
).Title
,Version
,Authors@R
,Description
,License
, andDate
fields are used to construct the DASHBOARD page. -
Start each vignette with ‘yaml’ containing essential metadata about the document – title and author(s). Include other information if desired, e.g., abstract, (static) date of last modification.
-
Use a file naming system AND a yaml
title
field that sorts files into the order in which the document content is to be presented, e.g., using file names01-Setup.Rmd
,02-...
and titles (in the yaml)title: "01 Setup"
, … Naming both files and titles in this way provides some chance that the Rmd files are presented, or can be made to be presented, sensibly across the Bioconductor package landing page and Workspace / NOTEBOOK interface. -
All code chunks, regardless of annotations such as
eval = FALSE
orecho = FALSE
are converted to visible, evaluated cells in jupyter notebooks. Replace code chunks that you do not wish the user to evaluate with HTML tags<pre></pre>
. -
Although both Rmarkdown and python notebooks support code chunks in multiple languages, there is no support for this in the conversion process – all cells are presented as R code.
Current best practice is to use quarto for conversion of .Rmd to ipynb. Quarto is available on the Bioconductor docker image, or easily installed on Linux, macOS, or Windows.
Support for conversion using the Python notedown module is no longer supported.
sessionInfo()
#> R version 4.4.1 Patched (2024-06-25 r86866)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] compiler_4.4.1 BiocManager_1.30.23 fastmap_1.2.0
#> [4] cli_3.6.3 tools_4.4.1 htmltools_0.5.8.1
#> [7] rstudioapi_0.16.0 yaml_2.3.9 codetools_0.2-20
#> [10] rmarkdown_2.27 knitr_1.48 xfun_0.45
#> [13] digest_0.6.36 rlang_1.1.4 evaluate_0.24.0