Source Data: https://github.com/xnotynot/deep-learning-challenge/blob/main/Resources/charity_data.csv
Primary Project Notebook: https://github.com/xnotynot/deep-learning-challenge/blob/main/adv_ml_fund_analysis.ipynb
Optimization Notebook: https://github.com/xnotynot/deep-learning-challenge/blob/main/adv_ml_fund_analysis_optimization.ipynb
Saved Models: https://github.com/xnotynot/deep-learning-challenge/tree/main/Results
This project was designed to help create a predictive model to determine whether applicants for aid from the non-profit group Alphabet Soup will successfully use the funding they request.
With a sample size of approximately 34K records of historical data, it uses a neural net deep learning model to make a binary prediction of funding success
Tensorflow Keras were used to build and compile a neural net model based
The initial model's accuracy fell slightly below 75%, further attempts were made to optimize the model to increase its accuracy.
EIN and NAME—Identification columns
APPLICATION_TYPE—Alphabet Soup application type
AFFILIATION—Affiliated sector of industry
CLASSIFICATION—Government organization classification
USE_CASE—Use case for funding
ORGANIZATION—Organization type
STATUS—Active status
INCOME_AMT—Income classification
SPECIAL_CONSIDERATIONS—Special consideration for application
ASK_AMT—Funding amount requested
IS_SUCCESSFUL—Was the money used effectively
- Preprocessing
- The only target variable in the dataset is
IS_SUCCESSFUL
. - The features which contribute to the analysis include:
APPLICATION_TYPE
,AFFILIATION
,CLASSIFICATION
,USE_CASE
,ORGANIZATION
,STATUS
,INCOME_AMT
,SPECIAL_CONSIDERATIONS
, andASK_AMT
. EIN
andNAME
are both identifications for the specific businesses that received funding in the past. As such, they do not contribute directly to the success of the funding, they are neither target nor features.
- Compiling, Training, and Evaluating the Model
-
Since we have high number of inputs, choosing 3 hidden layers seemed optimal
-
Later it was changed to 4 layers having little impact on the model's accuracy.
-
For the number of neurons, I went with the rule-of-thumb stating that the number should be less than twice the size of the input layer, with that leading to the number of 80 neurons for the first layer.
-
For the second hidden layer, used 30 neurons, which is fewer than the number of inputs.
-
I used ReLU as the method for both hidden layers and sigmoid for the output layer as learned from the sessions (need to do more research to understand the significance of this)
-
None of the models could reach the target accuracy of 75%. The peak value was close to 73%
-
The 3 different methods to increase the performance of my model.
- By dropping the
STATUS
andSPECIAL_CONSIDERATIONS
variables to see if they were reducing the effectivity of the analysis. - By adding another hidden layer between my two original hidden layers with 60 neurons.
- By doubling the number of neurons in each hidden layer
- By dropping the
None of these methods yielded positive results.
Overall, the models never reached the target accuracy of 75%. The best iteration yielded an accuracy close at 73%.
It is possible that we could have eliminated other features to improve model accuracy. Determining which features are important could be done via the feature analysis with confusion matrix.