A demonstration of scikit-learn using the Higgs Boson Machine Learning Challenge dataset.
-
Download the data file:
python download_data.py
-
Start up the Jupyter notebook server:
jupyter notebook
-
Click on
Tutorial.ipynb
in the browser to walk through the tutorial.
A set of scripts for training and analysis are in the python
directory. From the project directory, run python python/BDT200/train.py
to train the TMVA-like model. Then run python python/BDT200/analysis.py
to calculate the expected discovery significance.
You may wish to run on a more powerful, air-conditioned machine over an SSH connection. In this case, use port forwarding to access the Jupyter notebooks. In Step 2 above, start the notebook server with
jupyter notebook --no-browser
There will be a line similar to
[I 17:22:38.962 NotebookApp] The Jupyter Notebook is running at: http://localhost:XXXX/
in the output, where XXXX
is the port number (default is 8888). Use the SSH magic key combination, normally Enter+~+C to enter the SSH prompt. Then enter -L XXXX:localhost:XXXX
and point the web browser on your local machine to http://localhost:XXXX/.