This Python script uses Pytorch to construct a Neural Network for predicting Home Credit Default Risk.
- Imports: Required Python libraries for data processing, machine learning, and Pytorch are imported.
- Device Setup: The script automatically detects whether a CUDA-capable GPU is available and sets the device accordingly.
- Hyperparameters: The hyperparameters such as the test set size, number of epochs, batch size, learning rate and others are set.
- Data Loading: The Home Credit Default Risk data is loaded from CSV files, specifically application_train.csv, application_test.csv and previous_application.csv.
- Data Preprocessing: The data is preprocessed to be suitable for a neural network. This includes creating new custom features, handling missing values, standardising numerical features, and encoding categorical features.
- Data Splitting: The processed data is split into training and validation datasets using a specified test size.
- Tensor Creation: The training, validation, and test datasets are converted to Pytorch Tensors.
- Neural Network Model: The neural network model is defined and instantiated. This includes an embedding layer for categorical variables, followed by multiple fully connected layers.
- Training: The model is trained on the training data using the Adam optimizer and binary cross entropy loss for a specified number of epochs. The AUC and loss for each epoch are printed out.
- Validation: The trained model is validated on the validation data. The AUC and loss are again printed out.