Having in mind that Jupyter Notebook is a web-based environment that enables interactive data analysis and visualizations, a notebook is developed which brings together effective tools for data exploration. Τhe first part of the project is dedicated to the presentation of the visualization tools which are used in the notebook for Exploratory Data Analysis (EDA). By using these tools, one can visually analyze numerical and categorical attributes of a Query in order to understand the data and the relationships between the attributes. The second part of this project is referred to the importance of a library called DataComPy, which can be used to compare two Queries, in order to find similarities and overlaps
The database that was used in this project to evaluate the usage of the notebook is nba_salary.sqlite
To clone the repo:
git clone https://github.com/DimitrisReppas/A-notebook-for-complete-Exploratory-Data-Analysis-with-SQL.git
To install the requirements:
pip install -r /path/to/A-notebook-for-complete-Exploratory-Data-Analysis-with-SQL/requirements.txt
A_Notebook_for_complete_EDA.ipynb is a Jupyter Notebook containing the code for the exploratory data analysis. In this notebook, one can find:
-
Connection to the database (SQLite)
-
Primary information retrieval from the database
-
Posing queries in the SQL language
-
Analysis of numerical attributes
-
Analysis of categorical attributes
-
Analysis of relationships between numerical variables
-
Similarities between two queries