Resources for "Sequence Processing with Quantum Tensor Networks".
Install the requirements with pip install -r requirements.txt.
The tensor network contraction is performed using a combination of discopy
and jax.
First, follow these instructions to install JAX with the relevant accelerator support.
Documentation of discopy can be found here.
The imdb dataset is not included in the repository. It can be downloaded from the link above, then preprocessed with the preprocessing scripts.
BIO_preprocess.pypreprocesses theprotein-bindingdataset. This takes the genetic strings and translates them to index representation. For TTN and CTN it also pads to appropriate powers of 2.TTN_preprocess.pytakes in sequences and acquires the appropriate offsets for fast contraction inTTN_train.py.CTN_slide_preprocess.pytakes in sequences and saves them as lists of subsequences of the desired window size.NLP_parsing.pyparses language data into a syntactic tree using Lambeq's CCG parser.NLP_preprocess.pytranslates trees into sequential instructions for syntactic models (STN, SCTN). For the other tree models (CTN, TTN) it also pads sequences in groups of the nearest power of two. There is also the option to cut the data keeping the X most common syntactic structures while maintaining dataset balance. This is necessary for SCTN.
TTN_train.pyis all scalable models [PTN,STN,TTN]. Example datasets included are:protein-binding,rotten-tomatoes,clickbait.CTN_train.pyis theCTNmodel. Example datasets included areprotein-bindingand reducedclickbait.SCTN_train.pyis theSCTNmodel. Example datasets included are reducedclickbait.CTN_slide.pyis the sliding window option or theCTNsmodel. Example datasets included areprotein-binding,rotten-tomatoes.
JIT compilation for CTN_train.py and SCTN_train.py takes time in the beginning, but speeds up the training process. The first epoch will be slow, but subsequent epochs will be faster. JIT compiltation can be disabled in the config file.