Simple functions for retreiving larger queries for summaries and sequences from the NCBI nucleotide database using the Rentrez package. You'll need to first register an account with NCBI and request an api key to allow rapid downloads; this solves the issue of queries being sent too quickly for unregistered users (a frequent cause of errors if seeking large amounts of records). The dependancies of this code are tidyverse
, rentrez
, and Biostrings
Assuming the script is saved in your current working directory
source('./ez_rentrez.R')
You'll need to put your own key here
apikey <- 'never1gonna2give3you4up'
Use the ncbi browser online to refine your search; there is a window that shows you the search string that you can paste for searchexp
below.
Don't put any of the following terms to your search expression: retstart, retmax, retmode
searchexp <- '18S AND apicomplexa[ORGN] AND 0:10000[SLEN] NOT (genome[TITL])'
apicomplexa.search <- get_ncbi_ids(searchexp)
apicomplexa.summary.df <- get_ESummary_df(searchexp, apikey)
apicomplexa.fasta.df <- get_Efasta(searchexp, apikey)