/SomaScan.db

Bioconductor package containing annotations for SomaLogic's SomaScan assay

Primary LanguageROtherNOASSERTION

SomaScan.db

The SomaScan.db package is a platform-centric R package that provides extended biological annotations for analytes in the SomaScan assay menu, using resources provided by the Bioconductor project. The package exposes a single object, SomaScan.db, which is an SQLite database that can be queried to retrieve annotations for SomaScan analytes.

SomaScan.db is structured around a primary identifier, the SomaLogic sequence ID (SeqId), which is in the format 12345-67. In this package, the SeqId may also be referred to as the “PROBEID”. This identifier is the cornerstone of the SomaScan assay, and is used to uniquely identify SomaLogic analytes. For more information about SeqIds, please see ?SomaDataIO::SeqId

The SomaScan.db package enables mapping from SeqIds to other identifiers from popular public data repositories, many of which are gene-based, and vice versa. See below for installation instructions and usage examples.


Installation

The development version of the SomaScan.db package can be be installed from GitHub:

remotes::install_github("SomaLogic/SomaScan.db")

The package can then be loaded using the usual syntax:

library(SomaScan.db)

Dependencies

The SomaScan.db package requires R >= 4.2.0, and depends on the following R packages:

  • methods
    • comes bundled with R installation
  • DBI
    • install from CRAN: install.packages("DBI")
  • AnnotationDbi (>= 1.56.2)
    • install from Bioconductor: BiocManager::install("AnnotationDbi")
  • org.Hs.eg.db (>= 3.14.0)
    • install from Bioconductor: BiocManager::install("org.Hs.eg.db")

You may also want to install another of SomaLogic’s R packages, SomaDataIO, which is designed for reading, writing, and manipulating ADATs. If you have not already used SomaDataIO to work with your SomaScan data, you will likely find it highly useful. SomaDataIO is available on CRAN.


Usage

The annotations in SomaScan.db can be queried using 5 methods that are common amongst Bioconductor annotation packages:

  1. keys returns a list of all central identifiers in the package, aka SomaScan analytes, for which there are annotations available:
keys(SomaScan.db)
  1. keytypes lists data types that can be used as keys to query the SQLite database:
keytypes(SomaScan.db)
  1. columns lists all available data types:
columns(SomaScan.db)
  1. mapIds retrieves annotation data (from only a single data type, aka column):
mapIds(SomaScan.db, keys = "18342-2", columns = "SYMBOL", multiVals = "first")
  1. select retrieves annotation data en masse (from multiple columns) using values from keys and columns:
select(SomaScan.db, keys = "18342-2", columns = c("ENTREZID", "SYMBOL", "UNIPROT"))

select can also be used to identify SeqIds associated with a gene or or protein of interest (here, PROBEID refers to the SomaScan SeqIds):

select(SomaScan.db, keys = "EGFR", keytype = "SYMBOL", columns = "PROBEID")

For more detailed usage examples, please see the SomaScan.db package vignettes, or the introductory vignette from Bioconductor’s AnnotationDbi:

vignette("IntroToAnnotationPackages", package = "AnnotationDbi")

SomaScan Menus

The SomaScan menu version will be referenced at various points throughout this package and its documentation. Please see the table below for information about each menu:

SomaScan version Common name Plex size1
v4.0 5k 5,284
v4.1 7k 7,596

MIT LICENSE

SomaScan.db is licensed under the MIT license and is intended solely for research use only (RUO) purposes. The code contained herein may not be used for diagnostic, clinical, therapeutic, or other commercial purposes.

Footnotes

  1. SomaScan.db contains annotations for human protein targets only. However, some targets in the SomaScan menu are associated with biological entities not represented in this package. As such, this value may differ from the exact number in your ADAT.