/ribiosBioMart

Functions to access local installations of the BioMart database

Primary LanguageR

About

ribiosBioMart is the 'sans Ensambl' middle ware to access the Ensembl BioMart data inside an plain MySQL/MariaDB database.

It aims to provide the same functions as the ws-based biomaRt R package

Installation

You need to install Bioconductor and the following R packages from the GitHub repository:

library(devtools)
install_github("Accio/ribios/ribiosBase")
install_github("Accio/ribios/ribiosBioMart")

Additionally, you need an operational MariaDB server instance populated with a copy of the latest Ensembl BioMart data.

Examples

Connect to database server & list available datasets

library(ribiosBioMart)
library(RMySQL)

# Your acces data to your database server
# listLocalDatasets, useLocalMart, and getLocalBM will use this to create a 
# single-use connection for each function call
conn <- EnsemblDBCredentials(host = "TODO",
                             port = 12345,
                             user = "TODO",
                             passwd = "TODO",
                             ensembl_version = 93)

# Alternatively: Use a direct connection for the entire session.
# You will need to manage / close it yourself
#conn <- dbConnect (MySQL (), 
#                   user="TODO", 
#                   password="TODO",
#                   dbname="ensembl_mart_93", 
#                   host="TODO", 
#                   port=12345)
ds <-listLocalDatasets(conn)
ds[1:5,]

Initialize mart object and retrieve available attributes & filters

localMart = useLocalMart(conn, dataset="hsapiens_gene_ensembl")

attrs = listLocalAttributes(localMart)
attrs[1:5,]

filters = listLocalFilters(localMart)
filters[1:5,]

Perform queries

Here: Annotate a set of Affymetrix identifiers with HUGO symbol and chromosomal locations of corresponding genes:

affyids=c("202763_at","209310_s_at","207500_at")
getLocalBM(attributes = c('affy_hg_u133_plus_2', 'hgnc_symbol', 'chromosome_name',
                         'start_position', 'end_position', 'band'),
           filters = 'affy_hg_u133_plus_2', 
           values = affyids, 
           mart = localMart)

The functionality of getLocalBM should be more or less the same as those of getBM in the biomaRt package

Limitations

The getLocalBM does not support filters / attributes that require access to a database schema other than ensembl_mart_92. This package will therefore abort its execution with an error if the user tries to utilize one of the following attributes / filters:

Filters:

  • "name_2" ("qtl_feature.name_2")
  • "qtl_region" ("qtl_feature.qtl_region")
  • "go_parent_term" ("go_clos")
  • "go_parent_name" ("go_name")
  • "so_consequence_name" ("so_mini_parent_name")

Attributes:

  • structure_gene_stable_id
  • structure_transcript_stable_id
  • structure_translation_stable_id
  • structure_canonical_transcript_id
  • structure_chrom_name
  • structure_gene_chrom_start
  • structure_gene_chrom_end
  • structure_transcript_chrom_start
  • structure_transcript_chrom_end
  • structure_transcription_start_site
  • structure_transcript_length
  • structure_transcript_chrom_strand
  • structure_external_gene_name
  • structure_external_source_name
  • structure_5_utr_start
  • structure_5_utr_end
  • structure_3_utr_start
  • structure_3_utr_end
  • structure_cds_length
  • structure_cdna_length
  • structure_peptide_length
  • struct_transcript_count
  • structure_translation_count
  • structure_description
  • structure_biotype
  • structure_ensembl_exon_id
  • structure_cds_start
  • structure_cds_end
  • homologs_ensembl_gene_id
  • homologs_ensembl_transcript_id
  • homologs_ensembl_peptide_id
  • homologs_canonical_transcript_id
  • homologs_chromosome_name
  • homologs_start_position
  • homologs_end_position
  • homologs_strand
  • homologs_band
  • homologs_external_gene_name
  • homologs_external_gene_source
  • homologs_ensembl_CDS_length
  • homologs_ensembl_cDNA_length
  • homologs_ensembl_peptide_length
  • homologs_transcript_count
  • homologs_percentage_gc_content
  • homologs_description
  • snp_ensembl_gene_id
  • snp_ensembl_transcript_id
  • snp_ensembl_peptide_id
  • sequence_canonical_transcript_id
  • snp_chromosome_name
  • snp_start_position
  • snp_end_position
  • snp_strand
  • snp_band
  • snp_external_gene_name
  • snp_external_gene_source
  • snp_ensembl_CDS_length
  • snp_ensembl_cDNA_length
  • snp_ensembl_peptide_length
  • snp_transcript_count
  • snp_percentage_gc_content
  • snp_description
  • transcript_exon_intron
  • gene_exon_intron
  • transcript_flank
  • gene_flank
  • coding_transcript_flank
  • coding_gene_flank
  • 5utr
  • 3utr
  • gene_exon
  • cdna
  • coding
  • peptide
  • upstream_flank
  • downstream_flank
  • sequence_gene_stable_id
  • sequence_description
  • sequence_external_gene_name
  • sequence_external_source_name
  • sequence_str_chrom_name
  • sequence_gene_chrom_start
  • sequence_gene_chrom_end
  • sequence_gene_biotype
  • sequence_family
  • sequence_upi
  • sequence_uniprot_swissprot_accession
  • sequence_uniprot_sptrembl
  • sequence_uniprot_sptrembl_predicted
  • sequence_transcript_stable_id
  • sequence_translation_stable_id
  • sequence_canonical_transcript_id
  • sequence_biotype
  • sequence_transcript_biotype
  • sequence_transcript_chrom_strand
  • sequence_transcript_chrom_start
  • sequence_transcript_chrom_end
  • sequence_transcription_start_site
  • sequence_peptide_length
  • sequence_transcript_length
  • sequence_cdna_length
  • sequence_exon_stable_id
  • sequence_type
  • sequence_exon_chrom_start
  • sequence_exon_chrom_end
  • sequence_exon_chrom_strand
  • sequence_rank
  • sequence_phase
  • sequence_end_phase
  • sequence_cdna_coding_start
  • sequence_cdna_coding_end
  • sequence_genomic_coding_start
  • sequence_genomic_coding_end
  • sequence_constitutive

Final Note

If you used a direct DBIConnection for the database access, don't forget to close your database connections

# dbDisconnect (conn)