Synthetic Genomics D-SPACE Public Library

Demo site

A public demo of the D-SPACE annotation and search capabilities is available here https://dspace.bio/.

Clone this repo.
Install your preferred version of Tensorflow based on whether you're running on a GPU enabled machine or not. Either: pip install tensorflow or pip install tensorflow-gpu.
Install this package:

pip install -e .

Obtain .dat.gz files for Sprot and Trembl from Uniprot (We used 02_2018)
Obtain Uniref100 data from Uniprot and make a list of Uniref100 representative cluster proteins (1 per line) - these are the good ids that are non-redundant
Run parse_split_uniprot.py with data from steps 1 and 2
Run shuffle_uniprot.py on each subset of the data (train, test, val)
Move the resulting shuffled JSON files to 3 separate directories (named "train", "test", "val") within a common directory
Run

train_dspace