A public demo of the D-SPACE annotation and search capabilities is available here https://dspace.bio/.
-
Clone this repo.
-
Install your preferred version of Tensorflow based on whether you're running on a GPU enabled machine or not. Either:
pip install tensorflow
orpip install tensorflow-gpu
. -
Install this package:
pip install -e .
-
Obtain .dat.gz files for Sprot and Trembl from Uniprot (We used 02_2018)
-
Obtain Uniref100 data from Uniprot and make a list of Uniref100 representative cluster proteins (1 per line) - these are the good ids that are non-redundant
-
Run parse_split_uniprot.py with data from steps 1 and 2
-
Run shuffle_uniprot.py on each subset of the data (train, test, val)
-
Move the resulting shuffled JSON files to 3 separate directories (named "train", "test", "val") within a common directory
-
Run
train_dspace