/A-notebook-for-complete-Exploratory-Data-Analysis-with-SQL

In this project, a Jupyter Notebook was developed suitable for exploratory data analysis and formulation of SQL queries. This web-based environment is perfect for interactive data analysis, visualizations and can be associated with database management systems.

Primary LanguageJupyter Notebook

A notebook for complete exploratory data analysis with SQL

Having in mind that Jupyter Notebook is a web-based environment that enables interactive data analysis and visualizations, a notebook is developed which brings together effective tools for data exploration. Τhe first part of the project is dedicated to the presentation of the visualization tools which are used in the notebook for Exploratory Data Analysis (EDA). By using these tools, one can visually analyze numerical and categorical attributes of a Query in order to understand the data and the relationships between the attributes. The second part of this project is referred to the importance of a library called DataComPy, which can be used to compare two Queries, in order to find similarities and overlaps

Dataset

The database that was used in this project to evaluate the usage of the notebook is nba_salary.sqlite

Run locally

To clone the repo:

  git clone https://github.com/DimitrisReppas/A-notebook-for-complete-Exploratory-Data-Analysis-with-SQL.git

Requirements

To install the requirements:

  pip install -r /path/to/A-notebook-for-complete-Exploratory-Data-Analysis-with-SQL/requirements.txt

Exploratory data analysis

A_Notebook_for_complete_EDA.ipynb is a Jupyter Notebook containing the code for the exploratory data analysis. In this notebook, one can find:

  • Connection to the database (SQLite)

  • Primary information retrieval from the database

  • Posing queries in the SQL language

  • Analysis of numerical attributes

  • Analysis of categorical attributes 

  • Analysis of relationships between numerical variables 

  • Similarities between two queries