This is a collection of python scripts exploring the basics of exploratory data analysis (EDA) with python along with ways to enhance efficiencies in python EDA, such as using Auto EDA libraries.
This script examines the basics of EDA in Python using the very familiar Titanic dataset It covers the fundamental topics of:
- Understanding the dataset structure
- Basic statistical summaries
- Data Cleaning
- Visualization
- Correlation analysis
- Handling categorical data
- Feature engineering
- Preliminary Insights
Under the AutoEDA script is a collection of auto EDA libraries in Python and how to execute them along with their strengths and weaknesses.
The decision to use auto EDA depends on:
- specific needs of the project
- size and complexity of the dataset
- expertise of the user
Auto EDA tools offer a quick and efficient way to get a broad overview of the data.
Manual EDA allows for more nuanced and detailed analysis.
Packages include:
- YData Profiling, https://pypi.org/project/ydata-profiling/
- Sweetviz, https://pypi.org/project/sweetviz/
- DataPrep, https://pypi.org/project/dataprep/
- D-Tale, https://pypi.org/project/dtale/
- AutoViz, https://pypi.org/project/autoviz/
This script examines basic statistical test using the Titanic and Iris datasets. It covers how to choose when to use them based on data type, distribution, and population size. Basic tests include:
- Chi-Square
- t-Test
- ANOVA
- Kruskal-Wallis
- Pearson Correlation
- Linear Regression
- Logistic Regression