sevketsayin/SubsampledLogisticRegression

Repository of the AAAI paper

Jupyter NotebookMIT

A Provably Accurate Randomized Sampling Algorithm for Logistic Regression

Code repository for the paper:

Agniva Chowdhury and Pradeep Ramuhalli. A Provably Accurate Randomized Sampling Algorithm for Logistic Regression. In Proceedings of the 38th AAAI Conference on Artificial Intelligence, 2024.

Technical Appendix

Technical Appendix of the paper can be found in TechnicalAppendix.pdf.

Datasets

Cardiovascular disease dataset (cardio): cardio_train.csv (sourced from here)
Bank customer churn prediction dataset (churn): Bank Customer Churn Prediction.csv (sourced from here)
Default of credit card clients dataset (default): default of credit card clients.csv (sourced from here)

Codes

To compute row leverage scores of a matrix: leverage_scores.py
To perform leverage score, l2s, or uniform sampling: row_sampling.py

The code for l2s sampling has been sourced from here.

Notebooks

To reproduce the experiments in the paper, run the following Jupyter Notebooks:

For Cardiovascular disease dataset: cardio_train.ipynb
For Bank customer churn prediction dataset: default_of_credit_card_clients.ipynb
For Default of credit card clients dataset: Bank_Customer_Churn_Prediction.ipynb

Please contact Agniva Chowdhury for questions or comments.