banking_solution

Helps customers to estimate future debits/credits on account and advises how much to transfer to avoid overdraft

There are two models here:

Forecasting to estimate next period spending and incomes
Estimate account deposit to avoid account overdraft

Forecasting

Model uses transformer-decoder architecture. Last output item is a prediction of the next interval value (next week or next day). I used transformer code from BERT. I wrote my own transformer encode-decode - adversarial sparse transformer. It would be better to use my ast project to forecast expenditures!

Training steps:

Prepare data

Data that is used here is from fraud-detection-handbook. Please download and generate data. Something like two years (730 days) should be used because data is generated for forecasting and for second task - estimating deposit to avoid overdrafts. This second models need more time steps. One year is just about 52 weeks, so two years seems more reasonable. When generating data in fraud-detection-handbook, fraud should be disabled.

Command below will create train and test files for both forecasting and estimating deposit to avoid overdrafts plus file suitable for autoregression:

python prepare_data.py --lookback_history=12 --scaler=Custom --aggregate=WEEK --train_file4=data/simulated_without_fraud_730.txt

Training

Loss is RMSE.

Command:

python training.py --action=TRAIN --output_dir=checkpoints --hidden_size=32 --train_epochs=300 --save_batches=1000 --training_set_size=422746 --batch_size=32 --lookback_history=12 --dropout_prob=0.1 --num_features=1 --learning_rate=1e-4 --scaler=Custom --num_hidden_layers=2 --num_attention_heads=2

Evaluate

It will create output.csv file with metrics.

python training.py --action=EVALUATE --output_dir=checkpoints --hidden_size=32 --batch_size=32 --lookback_history=12 --num_features=1 --scaler=Custom --num_hidden_layers=2 --num_attention_heads=2

Predict

It will create output.csv file with predictions: actual/estimate pairs.

python training.py --action=PREDICT --output_dir=checkpoints --hidden_size=32 --batch_size=32 --lookback_history=12 --num_features=1 --scaler=Custom --num_hidden_layers=2 --num_attention_heads=2 --predict_file=test.tfrecords

Estimation of deposit

This model is based on the code of inventory management project. It is a similar problem. If there is an outflow of funds then what the amount should be brought in? This is Markov decision process (MDP) implemented with DDPG algorithm. I found difficulty to use stochastic algorithm. Action space is positive. Gaussian continuous distribution despite mean restricted to positive, will always produce some negative samples. Using truncated Gaussian or setting negative samples to zero, did not work well with my experiments. Another option is to use stochastic discrete action space. Please also review inventory_management, it has references to papers explaining what it is based on and how it is implemented.

Training steps:

Prepare data

Estimates are done in the forecasting part. It is just packaged differently for this problem, records is a time sequence, each record has estimated and actual value structured as (accounts, estimated_value(s)). For example, (4997, 1). 1 is for debit in this case. if credit is predicted as well, it will be 2.

Both for training and prediction, customer balances are random. Obviously, for prediction, it would be needed to provide actual account balances.

prepare_data.py had already generated history when data for forecasting has been prepared.

balance_train.tfrecords
balance_test.tfrecords

Model does not estimate on the fly, estimates are prepared from history. From transformer-decoder folder, run these commands to create estimates files:

time python training.py --action=PREDICT_FOR_BALANCE --output_dir=checkpoints --hidden_size=32 --lookback_history=12 --num_features=1 --scaler=Custom --num_hidden_layers=2 --num_attention_heads=2 --predict_file=balance_train.tfrecords --output_file=balance_train_estimate.tfrecords --num_accounts=4997 time python training.py --action=PREDICT_FOR_BALANCE --output_dir=checkpoints --hidden_size=32 --lookback_history=12 --num_features=1 --scaler=Custom --num_hidden_layers=2 --num_attention_heads=2 --predict_file=balance_test.tfrecords --output_file=balance_test_estimate.tfrecords --num_accounts=4997

Rewards formula:

r(s) = 1 - k1z - k2critical - k3*balance

z - 1 if overdraft critical - 1 if breached some critical level

k1 - tuning coefficient for overdraft k2 - tuning coefficient for critical balance k3 - tuning coefficient for balance punishment

Increasing k1 and k2, k3 seems should result in a Policy to keep balance in some jail. I found that balance stays somewhere in $3,000 on average for all customers. Average deposit trends toward average expenditures to about $700. Setting right k1, k2, k3 is actually quite tricky!

Training

When training, see critic and actor and reward convergence. Also, at the end, account funding transfers averages should be close to the average spending. Otherwise, the balance will get into zero and there will be overdrafts or balance will get to maximum and there will be a lot of money in the account.

Command sample:

python training.py --action=TRAIN --train_episodes=6000 --output_dir=checkpoints --num_accounts=4997 --train_file=data/balance_train_estimate.tfrecords --batch_size=2 --waste=18.0 --hidden_size=96 --actor_learning_rate=1e-5 --critic_learning_rate=1e-5 --decay_steps=100000 --use_actual --zero_weight=0.6 --critical_weight=0.6 --critical_balance=0.005

Rewards

Waste

Overdraft

Critical balance

Balance

Action

Spendings Actual

Spendings Estimate

Critic loss

Actor loss

Prediction

Outputs:

As trained model:

balance
action
overdraft

Model actions (sample for one account )

As heuristic algorithm:

balance
action
overdraft

Heuristic (sample for one account )

Debit data:

debit estimate
actual debit

Sample command:

python training.py --action=PREDICT --train_episodes=6000 --output_dir=checkpoints-clean --num_accounts=4997 --train_file=data/balance_train_estimate.tfrecords --batch_size=2 --waste=20.0 --hidden_size=96 --actor_learning_rate=1e-5 --critic_learning_rate=1e-5 --decay_steps=100000 --use_actual --zero_weight=0.6 --critical_weight=0.6 --critical_balance=0.005 --predict_file=data/balance_test_estimate.tfrecords

This will produce output.csv with above metrics for each timestep as illustrated above for one account.

mangushev/banking_solution

banking_solution

Forecasting

Estimation of deposit