Having in mind that Jupyter Notebook is a web-based environment that enables interactive data analysis and visualizations, a notebook is developed which brings together effective tools for data exploration. Τhe first part of the project is dedicated to the presentation of the visualization tools which are used in the notebook for Exploratory Data Analysis (EDA). By using these tools, one can visually analyze numerical and categorical attributes of a Query in order to understand the data and the relationships between the attributes. The second part of this project is referred to the importance of a library called DataComPy, which can be used to compare two Queries, in order to find similarities and overlaps
The database that was used in this project to evaluate the usage of the notebook is nba_salary.sqlite
To clone the repo:
git clone https://github.com/DimitrisReppas/A-notebook-for-complete-Exploratory-Data-Analysis-with-SQL.git
To install the requirements:
pip install -r /path/to/A-notebook-for-complete-Exploratory-Data-Analysis-with-SQL/requirements.txt
A_Notebook_for_complete_EDA.ipynb is a Jupyter Notebook containing the code for the exploratory data analysis. In this notebook, one can find:
Connection to the database (SQLite)
Primary information retrieval from the database
Posing queries in the SQL language
Analysis of numerical attributes
Analysis of categorical attributes
Analysis of relationships between numerical variables
Similarities between two queries