Arborist builds trees for the IEDB. The trees are used for the user interface on https://iedb.org and the IEDB curation interface, and also for validating IEDB data. They combine data from the IEDB with community ontologies such as the NCBI Taxonomy and open scientific databases such as UniProt and Genbank.
WARN: This version of Arborist is still work-in-progress. It makes extensive use of Nanobot, which is also work-in-progress.
The Makefile
defines and documents all the specific steps for Arborist.
Run make help
to see the list of main tasks.
You can either run make
directly or inside a Docker container.
For Docker, run ./run_image.sh make
or sudo -E ./run_image.sh make
.
If you aren't using Docker,
first install the required software by running make deps
.
NOTE: Arborist currently supports only Linux on the x86_64 architecture.
The suggested workflow is:
- Update the cache with the latest IEDB tables
by running
src/iedb/update-cache
. This requires MySQL/MariaDB connection parameters to be set as IEDB_MYSQL_* environment variables: IEDB_MYSQL_HOST, IEDB_MYSQL_PORT, IEDB_MYSQL_USER, IEDB_MYSQL_PASSWORD, IEDB_MYSQL_DATABASE. - Run
make all
to build all trees. - Run
make serve
to start the web interface on http://localhost:3000.
These are the key Make tasks for building trees, in their dependency order:
make iedb
load IEDB data: This runs thesrc/iedb/update-cache
scriptmake ncbitaxon
build the NCBI Taxonomymake organism
build the organism and subspecies trees: This also creates the list of "active species" used by IEDB, and the "active taxa" that fall under these species.make proteome
select a proteome for each active speciesmake protein
build the protein treemake all
build all trees
TODO: build more trees: peptide, molecule, assay, disease, geolocation, ...
Here are some other important Make tasks:
make deps
install required softwaremake serve
run the web interface on http://localhost:3000make clean
remove all build filesmake clobber
remove all generated filesmake help
print this message
bin/
contains any required binaries that aren't already installedbuild/
all sorts of generated filesiedb/
selected tables from IEDB for use herearborist/
general build files<species_id>/
species-specific build files
cache/
compressed data from various sourcesiedb/
selected tables from IEDBncbitaxon/
NCBI Taxonomy'staxdmp.zip
files
current/
links to the cached data to use for buildsiedb
links to a subdirectory ofcache/iedb/
taxdmp.zip
links to a file incache/ncbitaxon/
result/
TODO date-stamped directories of results, andlatest
linksrc/
iedb/
config and schemas for IEDB dataarborist/
config and schemas for Arborist tablesspecies/
config and schemas for species proteomes and protein treesorganism/
scripts for building the organism treeproteome/
scripts for selecting proteomesutil/
utility scripts for working with databasestemplates/
Nanobot HTML templates