intel/neuro-vectorizer

Missing scripts for reproducing with pretrained code2vec model

Closed this issue · 4 comments

In the NeuroVectorizer paper (the version submitted to CGO 2020), in the Artifact Description appendix "Experiment Workflow" section, you mention two options for running the framework: one that takes weeks, and one that takes a day or less (skipping the code2vec training). The instructions for the first option are included in the README in this repo, but the instructions for the second option (and any other associated Python scripts) are not included in this repo.

Would you be able to provide scripts and/or instructions for 1) training the RL model without retraining code2vec and/or 2) running the experiment scripts mentioned in the artifact description?

I think the current instructions in the README do not retrain code2vec. if you look in

def init_RL_env(self):
it does load a pretrained code2vec model.

Using the scripts as they are, how close should the results be to those in the paper (and how much should that depend on hardware)? Running the training (just of the RL model, using the provided code2vec embeddings), we get the reward plot in the image below. It quickly (~20,000 ts) reaches its stable point oscillating around 0.2. Generating our own O3 runtimes and then running inference with the final checkpoint of the training, we get a geomean of 1.29x speedup on the /tests set. These numbers differ from the paper's reward of about 0.3, geomean speedup of 2.67, and stabilization at 200k+ training steps. I'm running in Windows Subsystem for Linux on a laptop with an Intel I7 (AVX2 support). I also tried reducing the VF space to [1, 8] because AVX2 only supports 256 bit = 32 byte = 8 int-wide vector registers, similar results (but ~0.4 reward). Edit: bruteforce showed some optimal VFs > 8, so I assume it isn't limited to using one vector register
vfif_training

^^^ It's worth noting: Running the bruteforce VF/IF search function on tests/, vs. their O3 runtimes on my machine, generates a geomean speedup of about 1.33x. So I understand how the RL speedup is less than that. Still unsure of why the reward stabilizes so early in training, and what factors lead to the original paper's geomean speedup of >2

Great question. Yes, it highly depends on the hardware you are running on and how you are separating the hardware the training runs on and compilation runs on. Note that the geomean in the paper depends on the applications selected as well.