LHCb FlavorTagging Trainer
Prototypes for FlavourTagging reoptimization scripts
Usage
Currently, only the crossval_training.py
and xgboost_training.py
are
actively developed. Each script uses configuration data from the configs/
directory.
Cut Selection and Bootstrapped Crossvalidation
Given a set of cut-parameters (defined in a tagger-specific config), the hyperparameters need to be validated to prevent overtraining.
This can be done with the crossval_training.py
script, e.g. like
./crossval_training.py -c configs/someconfig.json -p roc_curve_plot.pdf
which will read the given configuration, print out average mistag power values and plot the average roc curve, obtained in the bootstrapping step.
To speed up the read-in and selection step, the script is able to either write the selected tuple to disk (and only printout average tagging power values) via
./crossval_training.py -c config.json -o selected_tuple.root
or read a preselected file via
./crossval_training.py -c config.json -i selected_tuple.root -p plot.pdf
Training XGBoost for production
After the hyperparameters have been verified, a XGBoost model can be trained
with the xgboost_training.py
script. It will read files obtained in the
previous step (i.e. selected_tuple.root), train a XGBoost classifier with the
hyperparameters given in the configuration, calculate the per-event tagging
power and write out the trained model.
./xgboost_training.py -c config.json -i selected_tuple.root -o predicted_tuple.root -s xboost_model.model
Calibration
The best choice is to use the EPM here.
Running on lxplus
I strongly recommend setting up an isolated, clean python environment using Anaconda (miniconda), therefore download conda via
wget https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh
and install it with
bash Miniconda2-latest-Linux-x86_64.sh
this will create a virtual environment with its own python version in
~/miniconda2
.
To provide a recent version of gcc
and ROOT which will work with the new
environment run
SetupProject ROOT 6.06.06
and finally install the python dependencies via
~/miniconda2/bin/pip install numpy
~/miniconda2/bin/pip install pandas root-numpy root-pandas matplotlib sklearn xgboost tqdm
Once, all the dependencies are installed, you can start jupyter
jupyter notebook --port 61337 --no-browser
where --port
is any free port. If the port is already in use, jupyter will
automatically use the next free one.
To access the notebook from your desktop, just forward a local port to the
remote port on lxplus. You need to make sure that you are referencing the same
lxplus instance (e.g. define Host lxplus042.cern.ch
for your lxplus part of
~/.ssh/config
).
ssh -N -f -L "8888:[::1]:61337" lxplus
will then forward your local port 8888 to the remote notebook. Just open the browser and visit localhost:8888.