Using Machine learning and neural networks to create a binary classifier cabpable of predicting whether applicants will be successful if funded by Alphabet Soup.
- Applications\Software: Jupyter Notebook 6.1.4
- Languages\Libraries: Python 3.8.5, pandas, scikit-learn, tensorflow
- Data: Charity Data
- IS_SUCCESSFUL will be the target variable for this model
- APPLICATION_TYPE, AFFILIATION, CLASSIFICATION, USE_CASE, ORGANIZATION, INCOME_AMT, SPECIAL_CONSIDERATIONS will be the features for this model
- Non-beneficial variables of NAME and EIN will be removed from this model
- Hidden Layers: 2
- Layer 1 Nodes/Activation Feature: 90/ReLU
- Layer 2 Nodes/Activation Feature: 30/ReLU
- Output Layer Activation Feature: Sigmoid
- Epochs = 100
- The original model was not able to achieve the target model 75% accuracy.
The following attempts were made to increase model performance:
- Change activation features for hidden layers, decrease node count, decrease epochs:
- Hidden Layers: 2
- Layer 1 Nodes/Activation Feature: 40/Sigmoid
- Layer 2 Nodes/Activation Feature: 20/Sigmoid
- Output Layer Activation Feature: Sigmoid
- Epochs = 10
- Add hidden layer, decrease node count, decrease epochs:
- Hidden Layers: 3
- Layer 1 Nodes/Activation Feature: 40/ReLU
- Layer 2 Nodes/Activation Feature: 20/ReLU
- Layer 3 Nodes/Activation Feature: 10/ReLU
- Output Layer Activation Feature: Sigmoid
- Epochs = 15
- Add hidden layer, decrease node count, decrease epochs:
- Hidden Layers: 4
- Layer 1 Nodes/Activation Feature: 100/ReLU
- Layer 2 Nodes/Activation Feature: 80/ReLU
- Layer 3 Nodes/Activation Feature: 60/ReLU
- Layer 4 Nodes/Activation Feature: 10/ReLU
- Output Layer Activation Feature: Sigmoid
- Epochs = 20
- The original model achieved 70% accuracy which does not meet the targeted accuracy of 75%
- After 3 attempts to optimize the original model, the model was still unable to achieve the desired accuracy.
- The closest result was 73% on Attempt 2, by reducing the epochs to 15, adding an additional third hidden layer, and reducing the node count for each hidden layer.
- Adjust the source data to remove outliers or variables that are confusing the model.
- Create additional bins for rare occurrences in columns.
- Increase or decrease the value for each bin.
- Remove additional unnecessary columns.