Repository of code for applying machine learning methods to CMS events, with the hope of improving tau decay mode resolution.
- install_script.sh installs the necessary packages
- dataframe_init.sh Loads the dataframes from root files, chooses relevant columns and saves as .pkl file - does not create new variables.
- dataframe_mod.sh Unpacks .pkl dataframe (df_ordered) and creates new variables, and saves dataframe. Also creates separate dataframe (imvar_df) for variables for creating images, which is saved as imvar_df.sav (using joblib to avoid memory errors).
- image_generator.sh implements imgen.py, to generate and save numpy arrays of images in 100,000 event batches - images are compressed by using uint8 format rather than f32.
- dataframe_split.sh separates the data for HL vars, large images, small images and y values into training and test data, saves in tf.data.dataset format, which can be efficiently cycled in and out of ram to reduce memory usage.
- train_model.sh loads tensors from /vols/cms/fjo18/Masters2021/Tensors and trains a model on them, before saving the model back into /vols/. Note: only saves model if 'save_model' parameter is set to 'True' in the 'train_NN.py' file
- train_model_HL.sh trains only on the HL variables (using train_HLNN.py file). Same parameters options as for full model. Current to-dos and issues can be seen under the 'Projects' tab.
- Update: We now have various files which train networks, depending on whether a CPU or GPU is needed. cpu_train_NN.sh and gpu_train_NN.sh are the main ones. These run train_NN_pd.py, which is now the main file in where models are built.