Code repository for the paper:
Agniva Chowdhury and Pradeep Ramuhalli. A Provably Accurate Randomized Sampling Algorithm for Logistic Regression. In Proceedings of the 38th AAAI Conference on Artificial Intelligence, 2024.
Technical Appendix of the paper can be found in TechnicalAppendix.pdf.
- Cardiovascular disease dataset (cardio): cardio_train.csv (sourced from here)
- Bank customer churn prediction dataset (churn): Bank Customer Churn Prediction.csv (sourced from here)
- Default of credit card clients dataset (default): default of credit card clients.csv (sourced from here)
- To compute row leverage scores of a matrix: leverage_scores.py
- To perform leverage score, l2s, or uniform sampling: row_sampling.py
The code for l2s sampling has been sourced from here.
To reproduce the experiments in the paper, run the following Jupyter Notebooks:
- For Cardiovascular disease dataset: cardio_train.ipynb
- For Bank customer churn prediction dataset: default_of_credit_card_clients.ipynb
- For Default of credit card clients dataset: Bank_Customer_Churn_Prediction.ipynb
Please contact Agniva Chowdhury for questions or comments.