SourceData is a platform built by EMBO in collaboration with Vital-IT to make papers and datasets discoverable based on the experiments shown in figures.
When refering to SourceData, please cite the following paper:
SourceData - a semantic platform for curating and searching figures. Liechti R, George N, Götz L, El-Gebali S, Chasapi A, Crespo I, Xenarios I, Lemberger T. Nature Methods (2017) 14:1021 doi:10.1038/nmeth.4471 Set up .env from .env.example with appropriate credentials.
This repository includes several tools currently under development. These tools allow to generate the SourceData knowledge graph (sdg
), to upload MECA/JATS archives (neojats
) as graphs compatible with the SourceData graph and run a server (neoflask
) that implements pre-formed cypher queries and exposes a RESTful interface for integration in web applications.
To make sure you start with a clean build you can run:
docker-compose build --force-rm --no-cache
docker-compose down --volumes # to clean the content of the volumes
This can solve some issues, for example if you run build
with a wrong config file.
cp .env.example .env
docker-compose build
docker-compose up
Before you import any dump you need to make sure that Neo4j creates the layout for the databases. You can do that by running the commands above, or just this one (check your .env file for username/password):
docker-compose run --rm neo4j cypher-shell -a bolt://neo4j:7687 -u $NEO_USERNAME -p $NEO_PASSWORD
See a few sections below on how import a dump to load content into the neo4j database.
Use docker-compose.arm64.yml together with docker-compose.yml to make this app run on ARM-based computers (tested on an M1 Macbook) like so:
docker compose --file docker-compose.yml --file docker-compose.arm64.yml build
docker compose --file docker-compose.yml --file docker-compose.arm64.yml up
See https://docs.docker.com/compose/extends/#multiple-compose-files for more info on the concepts behind multiple compose files.
If you want to run only the frontend locally and use the backend on a remote server, you can run:
# go from the root of the repo to the frontend module
cd frontend
# install pinned dependencies
npm ci
# run dev server
NODE_ENV="serverless" npm run serve
This uses the API of the eeb-dev.embo.org server for data.
Normally you need this:
docker-compose build
docker-compose up -d
cat sdg/SD-indices.cql | docker-compose run --rm neo4j cypher-shell -a bolt://neo4j:7687 -u neo4j -p <NEO4J_PASSWORD> # define indices
docker-compose run --rm flask python -m sdg.sdneo PUBLICSEARCH --api sdapi # import source data public data
docker run --rm -it -v ~/.aws:/root/.aws --mount type=bind,source=<volume>/biorxiv/Current_Content/July_2020,target=/root/Current_Content/July_2020 amazon/aws-cli s3 sync --request-payer requester --exclude "*" --include "*.meca" s3://biorxiv-src-monthly/Current_Content/July_2020 ./Current_Content/July_2020/ --dryrun
aws s3 sync --request-payer requester --exclude "*" --include "*.meca" s3://biorxiv-src-monthly/Current_Content/July_2020 <path-to-biorxiv-archive>/biorxiv/Current_content/July_2020/
cat sdg/update_open.cql | docker-compose run --rm neo4j cypher-shell -a bolt://neo4j:7687 -u neo4j -p <NEO4J_PASSWORD> # generate merged graph
# update meca archives; sync to folder outside of docker build scope
cat neotools/purge_prelim.cql | docker-compose run --rm neo4j cypher-shell -a bolt://neo4j:7687 -u neo4j -p # remove prelim articles obtained from the CrossRef and bioRxiv APIs
docker-compose run --rm flask python -m neotools.rxiv2neo /app/biorxiv/<path_to_meca_archives> --type meca # import full text biorxiv preprints
docker-compose run --rm flask python -m peerreview.neohypo hypothesis # import peer reviews from hypothesis
docker-compose run --rm flask python -m peerreview.neohypo rrc19 # import peer reviews from rapid reviews: covid-19
docker-compose run --rm flask python -m peerreview.neohypo pci # import peer reviews from peer community in
docker-compose run --rm flask python -m peerreview.published # updates publication status
docker-compose run --rm flask python -m sdg.sdneo refereed-preprints --api eebapi # smarttag specified collection of preprints
docker-compose run --rm flask python -m sdg.sdneo subject-collections --api eebapi # smarttag all bioRxiv subject collections
cat sdg/SD-processing.cql | docker-compose run --rm neo4j cypher-shell -a bolt://neo4j:7687 -u neo4j -p <NEO4J_PASSWORD> # generate merged graph
cat sdg/SD-gds.cql | docker-compose run --rm neo4j cypher-shell -a bolt://neo4j:7687 -u neo4j -p <NEO4J_PASSWORD> # graph data science algo
docker-compose run --rm flask python -m sdg.algonet # finds named topics and entity highlights
cat sdg/SD-precompute.cql | docker-compose run --rm neo4j cypher-shell -a bolt://neo4j:7687 -u neo4j -p <NEO4J_PASSWORD> # precompute the graph used by front end
cat sdg/SD-prepare-docmap.cql | docker-compose run --rm neo4j cypher-shell -a bolt://neo4j:7687 -u neo4j -p $NEO_PASSWORD
docker-compose run --rm flask python -m neoflask.cache_warm_up # warm up cache
docker-compose run --rm flask python -m twitter.update --limit-date 2020-07-01 # --GO_LIVE to go live with Twitter updates
cat sdg/audit.cql | docker-compose run --rm neo4j cypher-shell -a bolt://neo4j:7687 -u neo4j -p <NEO4J_PASSWORD>
cat sdg/update_close.cql | docker-compose run --rm neo4j cypher-shell -a bolt://neo4j:7687 -u neo4j -p <NEO4J_PASSWORD> # generate merged graph
# visit http:/localhost:8080
# Make sure you dont have your neo4j running:
docker-compose down
# dump the contents of your database using a temporary container
docker compose run --rm --name neo4j-dump --env-file .env --mount type=bind,source=$PWD/data/neo4j-data,target=/data -it neo4j:4.4 bin/neo4j-admin dump --database=neo4j --to=data/neo4j.db.dump.`date +%Y-%m-%d-%H.%M.%S`
And then you can you load the download db dump with:
# Make sure you dont have your neo4j running:
docker-compose down
In development:
# load the contents of your database using a temporary container
docker run --rm --name neo4j-load --env-file .env --mount type=bind,source=$PWD/data/neo4j-data,target=/data --mount type=bind,source=$PWD/dumps,target=/dumps -it neo4j:4.4 bin/neo4j-admin load --database=neo4j --from=/dumps/<dump_filename>
# --force # ADDING --force WILL OVERWRITE EXISTING DB!
# if there is no pre-existing graph.db, then the option --force needs to me ommitted to avoid "command failed: unable to load database: NoSuchFileException"
Cache warm up in development, local:
docker compose run --rm flask python -m neoflask.cache_warm_up http://flask:5000/api/v1/
Manually clearing the cache:
docker compose exec redis redis-cli FLUSHALL
In production:
docker run --rm \
--name neo4j-dump \
--env-file .env \
--mount type=bind,source=$PWD/dumps,target=/dumps \
--mount type=volume,source=sd-graph_production_neo4j_data,target=/data \
-it neo4j:4.4 \
bin/neo4j-admin load --database=neo4j --from=/dumps/<dump_filename>
# Make sure you dont have your neo4j running:
docker-compose -f production.yml down
sudo mkdir dumps
sudo chown 7474:7474 dumps
# dump the contents of your database using a temporary container
docker run --rm \
--name neo4j-dump \
--env-file .env \
--mount type=bind,source=$PWD/dumps/,target=/dumps \
--mount type=volume,source=sd-graph_production_neo4j_data,target=/data \
-it neo4j:4.4 \
bin/neo4j-admin dump --to=/dumps/neo4j.`date +%Y-%m-%d-%H.%M.%S` --database=neo4j
add something like this to your local ~/.ssh/config
Host eeb-1 ec2-3-125-193-124.eu-central-1.compute.amazonaws.com
Hostname ec2-3-125-193-124.eu-central-1.compute.amazonaws.com
User ec2-user
IdentityFile ~/.ssh/id_rsa
# ssh into prod
ssh eeb-1
# clone the project
git clone git@github.com:source-data/sd-graph.git
cd sd-graph
# initial config
cp .env.example .env # and edit with your desired config; note: config for hypothes.is or sourcedata API are not needed for produtino
wget https://oc.embl.de/index.php/s/<token>/download
# build docker
docker-compose -f production.yml build
# force load the database with the dump
# THIS WILL OVERWRITE THE EXISTING DB
docker run --rm \
--name neo4j-load \
--env-file .env \
--mount type=bind,source=$PWD,target=/app \
--mount type=volume,source=sd-graph_production_neo4j_data,target=/data \
-it neo4j:4.4 \
bin/neo4j-admin load --from=/app/download --database=neo4j --force # WILL OVERWRITE!
# start the services
docker-compose -f production.yml up -d --remove-orphans
docker-compose -f production.yml run --rm flask python -m neoflask.cache_warm_up # warm up cache
Something like this will (generally) be enough, but really depends on your changes :)
git pull
docker-compose -f production.yml build
docker-compose -f production.yml up -d --remove-orphans