Author: Wycliffe Ndiba
This project aims to analyze the Titanic dataset to understand the factors influencing passenger survival. It involves data cleaning, exploratory data analysis, feature engineering, and the creation of a predictive model. The results are visualized through interactive plots and a Streamlit app.
This notebook has a more recent visualization of the analysis: titanic-analysis-1.
- Data Cleaning and Preprocessing: Removed unnecessary columns, converted categorical variables to numerical, and handled missing values.
- Exploratory Data Analysis (EDA): Explored survival distribution, correlations, and age distribution.
- Feature Engineering and Selection: Created a new feature 'FamilySize' and selected relevant features.
- Model Building and Evaluation: Built a Logistic Regression model, trained, and evaluated its performance.
- Data Visualization and Dashboard Creation: Used Plotly Express for interactive visualizations and created a Streamlit app for user interaction.
- Readme File Generation: Provided a concise and informative readme file.
- The survival distribution indicates a significant number of passengers did not survive.
- Correlation analysis helps identify potential influential features.
- Age distribution shows a diverse range of passengers.
- The Logistic Regression model achieved an accuracy of [accuracy].
Work in Progress
Link to Interactive Visualization
Work in Progress
- Include additional machine learning models for comparison.
- Enhance the Streamlit app with more features and insights.