The project of big data class.
- Design a deep learning model to predict the credit level of each customer.
- Distribute the whole dataset into 5 different parties in independent and identical distribution and then adapt a federated learning framework to do model training and aggregation. Show the superiority of collaborative training over individual local training.
BankChurners.csv
contains basic information of 9000 bank’s customers and the target variable is the credit level between 1(bad) to 10(excellent):
- CustomerId – unique Ids for bank customer identification.
- Geography – the country from which the customer belongs.
- Tenure – number of years for which the customer has been with the bank.
- Balance – bank balance of the customer.
- NumOfProducts – number of bank products the customer is utilizing.
- HasCrCard – binary flag for whether the customer holds a credit card or not.
- IsActiveMember – binary flag for whether the customer is an active member or not.
- EstimatedSalary – estimated salary of the customer in Dollars.
- Exited – binary flag 1 if the customer closed an account with the bank and 0 if the customer is retained.
- CreditLevel – credit level of the customer
New_BankChurners.csv
contains basic information of 1000 new bank’s customers and the credit level is unknown.
data_processing.ipynb
: data preprocessingdeep_learning.py
: training with one DNN model,accuracy:about 20%deep_learning_subclass.py
: In this file, we use one subclass model and three clarifiers to train, accuracy about 21.6%ml_with_nn.py
: deep learning + machine learning, 37.22%federated_learning.py
: federated learning module, include iid and non-iid, FedAvg and so on.federated_learning_noiid
: improvements of FL with non-iid data.
The procedure of data processing and analytics can be divided into three parts, including exploratory data analysis, data preprocessing and feature engineering. Details can be seen at data_processing.ipynb
The implementation can be seen at deep_learnig_full.py
The implementation can be seen at deep_learning_subclass.py
file
The implementation can be seen at ml_with_nn.py
file
Federated Learning process
- divide dataset by IID or non-IID
- foreach round:
- training on each clients with model and save weights
- update the model with avg_weig
- predict
- Steps to generate result
uncomment the
draw_distribution
function infederated_learning.py
-
get IID: using
bank_iid
function or set α to 1000 usedirichlet_partition
to get dataset -
get non-IID: set α to 0.1 and use
dirichlet_partition
to get datasetrun
federated_learning.py