Automated the exploratory data analysis process allowing the user to perform EDA on any dataset of thier choice. Built a user-friendly code free platform for the user. Web Application developed using Streamlit. More details below.
The web application allows any user to perform EDA on any dataset of their choice. The user simply has to upload their dataset and tick mark the offerings they wish to apply. The offerings are -
- General EDA which includes -
- checking dtypes,
- viewing columns,
- viewing missing data,
- aggregation tabulation,
- viewing numerical & categorical variables,
- dropping null values,
- cross tabulation,
- Pearson correlation,
- Spearman correlation,
- Univariate Analysis which includes - creating histograms, displots & countplots,
- Bivariate Analysis which includes building scatter plots, bar plots & violin plots, &
- Multivariate Analysis which includes creating histograms, heatmaps, pairplots & word clouds.
- EDA For Linear Models which includes -
- generating qqplots,
- viewing outliers,
- creating distplots, &
- performing chi square test.
- Machine Learning Model Building for Classification Problem which includes -
- selecting variables along with target variables,
- train-test data split, &
- the option to build Logistic Regression, Decision Tree, Random Forest, Naive Bayes & XGB Classifier Baseline Machine Learning models.
- Python 3.x
- scikit-learn
- pandas & numpy
- matplotlib
- streamlit
Web App Link: https://share.streamlit.io/purnima99/autoeda/app.py
-
Clone the repo by running -
git clone https://github.com/purnima99/AutoEDA.git
-
Use the command prompt to setup a virtual environment.
-
Install all dependencies and requirements using the following command -
python -m pip install -r requirements.txt
This will install all libraries required for the project.
-
Run the Streamlit App on your local machine -
streamlit run app.py
Project Status: Completed