As part of GSoC 2024, I contributed in Machine Learning for Science (ML4Sci). It is an open-source organization that brings together modern machine learning techniques and applies them to cutting edge problems in Science, Technology, Engineering, and Math (STEM). Over the summer, I worked on Quantum Machine Learning applied on High Energy Physics data (QMLHEP) to contrastively train models to output embeddings that can be used for other downstream tasks like classification.
- Code on QMLHEP GitHub Repository: ML4SCI/QMLHEP/tree/main/Quantum_SSL_for_HEP_Sanya_Nanda
- Code on my GitHub Repository: SanyaNanda/ML4Sci_QuantumContrastiveLearning (For latest updates post GSoC)
- Project Documentation: Technical Documentation/Final blog
- Implemented multiple trainable embedding functions to encode HEP data onto contrastive learning models.
- Developed numerous computer vision, graph-based and quantum hybrid models for contrastive learning framework.
- Experimented with different approaches for embedding functions and contrastive losses for training.
- Demonstrated an effort to prove quantum advantage using Quantum ML-based hybrid model.
- Benchmarked the trained embeddings of classical and quantum models.
Following are important documents pertaining to the project:
- Proposal submitted for GSoC 2024: Proposal
- Iest Tasks solved for GSoC 2024: Test Tasks
- GSoC Abstract: GSoC Abstract
- Mid Term Blog elucidating the work accomplished by the mid-term evaluation: Blog
- Mid-term lightning talk given to an audience of all the contributors and mentors of ML4Sci: Mid-term Lightning Talk Presentation
- Complete technical documentation or final blog, describing the whole project in depth, submitted for final evaluation: Blog
- Final-term lightning talk given to an audience of all the contributors and mentors of ML4Sci as part of final evaluation: Final Lightning Talk Presentation
- GSoC Lightning Talk, presented to all the contributors of GSoC 2024 on 25th September: GSoC Presentation
- Weights and Biases Reports on the experiments conducted: Benchmarking section of the project documentation
To begin with, the proposal I submitted ML4Sci GSoC Project can be found here. Following is the code designed and developed during the course of the project:
- Code Repository: https://github.com/ML4SCI/QMLHEP/tree/main/Quantum_SSL_for_HEP_Sanya_Nanda
- Project Documentation: https://sanyananda.github.io/ML4Sci_QuantumContrastiveLearning/
During the program, I worked on couple of blogs to document and communicate the project in a clear, concise and compact form, following are some relevant work products:
- Midterm Blog: https://medium.com/@sanya.nanda/quantum-contrastive-learning-on-lhc-hep-dataset-1b3084a0b141
- Final Evaluation Blog (Technical Documentation): https://sanyananda.github.io/ML4Sci_QuantumContrastiveLearning/
- A blog on the whole program from the selection process to the very end (coming soon)
At ML4Sci, presenting our work and ideas is just as important as working towards developing novel solutions, if not more. All the contributors and mentors of the organization met in a weekly call to present our findings and discuss our work. These calls were full of great learnings and amazing experiences. During the mid-term evaluation, we internally presented a 3 min lightning talk on our milestones in the first half of the program. Similarly, we presented at the end of the program as well to capture the essence of the whole project and our accomplishments. I am also selected to give a lightning talk at GSoC level to share my experience. Following are the links for the same:
- Midterm Lightning Talk: slides/ML4Sci-MidTerm.pdf
- Final Evaluation Lightning Talk: slides/ML4Sci-FinalEvaluation.pdf
- GSoC Lightning Talk: slides/GSoC2024-LightningTalk-Contributor-SanyaNanda.pdf
A comprehensive order to go through the jupyter notebooks:
It is advisable to walthrough these notebooks along with the project documentation.
-
CNN on quark-gluon: Experiments on quark-gluon
qssl/
: Main package directory.__init__.py
: Makesqssl
a Python package.data/
: Directory for data-related scripts consisting of dataloaders, pair-creation, preprocessing, augmentation and visualization.loss/
: Directory for loss function scripts consisting of contrastive pair loss, infoNCE, NT-Xent and fidelity loss functions.models/
: Directory for model definitions consisting of cnn, resnet18 and gnn encoders along with quantum circuits and hybrids.training/
: Directory for training scripts consisting of trainers for the modelsevaluation/
: Directory for evaluation scripts consisting of helper functions like confusion matrix, auc, lct etcutils/
: Directory for utility scripts consisiting of functions for extracting and plotting embeddingsconfig.py
: Configuration file for the project.
scripts/
: Directory for scripts to run training and evaluation.notebooks/
: Directory for Jupyter notebooks. Comprehensive Guide to navigate notebooksslides/
: Directory for presentation slides, lightning talks, proposal etcdocs/
: Directory for html and assets for the project documentation github.io pagerequirements.txt
: File listing the dependencies for the project.setup.py
: Setup script for the package.README.md
: This file.LICENSE
: License for the project..gitignore
: Git ignore file.
Experiments in a tabular format can be found in the Benchmarking section of the project documentation. Wandb reports can be found in the same section.
- Create a virtualenv
- Pip install the libraries mentioned in requirements.txt
- Try out the jupyter notebooks for better understanding
- To train the a model for example qcnn, use this command "python scripts/run_training.py"