Under development
Part 1: Data Analysis (EDA)
The objective of this project is to answer five valuable business questions related to life expectancy. Subsequently, the project will be extended to create an AI model that can predict life expectancy based on the provided data.
- What are the main factors that influence life expectancy in different countries?
- How has life expectancy changed over time in countries with different socioeconomic statuses?
- What is the relationship between healthcare expenditure and life expectancy?
- Is there a correlation between immunization coverage (Hepatitis B, Polio, Diphtheria) and life expectancy?
- How does alcohol consumption affect life expectancy in different regions?
- Import data from CSV.
- Clean data (handling missing values, duplicate data, etc.).
- Descriptive statistics.
- Data visualization to identify patterns and trends.
- Correlation analysis between variables.
- Use Python techniques to identify factors influencing life expectancy.
- Perform temporal analysis to observe changes in life expectancy over the years.
- Conduct correlation analysis to investigate the relationship between healthcare expenditure and life expectancy.
- Analyze the correlation between immunizations and life expectancy.
- Perform regional analysis on the impact of alcohol consumption on life expectancy.
- Export the file in
.csv
for use in: Model Training Notebook
- Python
Libraries:
- Pandas
- Matplotlib
- Seaborn
- Scikit-Learn
- Scipy
- Statsmodels
- Requests
- Functools
- Concurrent.Futures
- OS
Part 2: Project Extension - AI Model
Develop a predictive model to estimate life expectancy based on the provided data.
- Select relevant features.
- Normalize and transform data.
- Split data into training and testing sets.
- Test various algorithms (Linear Regression, Random Forest, Gradient Boosting, etc.).
- Evaluate model performance using appropriate metrics (MAE, RMSE, R²).
- Select the best performing model.
- Cross-validation.
- Hyperparameter tuning using GridSearchCV or RandomizedSearchCV.
- Train the final model with the best parameters.
- Save the trained model for future use.
- Python
- Libraries: Pandas, NumPy, Scikit-learn, XGBoost, Joblib
Integrate the predictive model into a web application where users can input data and obtain life expectancy predictions.
More information will be added throughout the development of the project