docker build -t epfl_solution .
docker run -it epfl_solution
Samples are pre-processed by first applying a FCGR mapping, followed by a 2D DCTII [Lichtblau2019]. The top left h x h matrix of the DCTII (lowest frequencies) is extracted and set at the hash of the genome.
$ go run model/main.go will process the samples of data/Challenge.fa and output the processed samples in model/X.binary model/Y.binary (X being the processed samples and Y the labels).
The Python script model/training.py will use model/X.binary model/Y.binary that can then be used to train the model.
The script will output the weights both in .npy and .binary as well as a .png image of the weights/features with gradient color coding.
$ make debug NBGENOMES=2000 will compile and run DebugTest.go which will process, encrypt, predict, decrypt the first 2000 samples located in data/Challenge.fa.
$ make key: generates the secret-key and stores it inkey/.$ make pro NBGENOMES=2000: processes the first 2000 samples located iindata/Challenge.fa. Returns the result intemps/.$ make enc: Encrypts the processed samples. Returns the encrypted processed samples intemp/.$ make pred: unmarshals the encrypted samples intemp/, evaluates the homomorphic prediction and marshals back the result intemp/.$ make dec: unmarshals the encrypted prediction intemp/, decrypts and outputs the result inresults/prediction.csv.
Processing and crypto parameters are located in lib/params.
$ make clean: clean all files inkeys/,temps/,results/and all compiled binary files. Does not clean files inmodel/.
The HE evaluation security is based on the R-LWE hardness. The used parameters are log(N)=10, log(Q)=29. Both the secret and the Gaussian error are sampled from a truncated discrete Gaussian distribution with standard deviation 3.19 and bound 19. The security is estimated to 128-bit according to https://homomorphicencryption.org/.
[Lichtblau2019] : Lichtblau Daniel. “Alignment-free genomic sequence comparison using FCGRand signal processing”, 2019.