Welcome to my data analysis project! Inspired by a research paper I read, I've developed a framework that leverages the power of natural language processing (NLP) for data analysis. The core feature of this project is its ability to convert natural language queries into SQL statements, execute these statements to extract relevant data, and create visualizations for a more intuitive understanding of the results.
https://abdulqg.pythonanywhere.com/index/
-
Natural Language Data Analysis: This project is designed to accept natural language queries. With the help of the GPT-3.5-Turbo API, these queries are transformed into executable SQL statements.
-
Data Extraction: The SQL queries are executed on a connected database, with results extracted and stored in a CSV file (
extracted-data.csv
). -
Data Visualization: The framework also offers the capability to visualize the extracted data using Python (plotly), providing a clear, comprehensible depiction of the results.
The following technologies and languages have been used in this project:
-
Python: The primary language for scripting and development.
-
SQLite: Essential for data extraction and manipulation.
-
GPT API: Used for the conversion of natural language queries into SQL statements.
-
Django: Web framework for Python.
-
PythonAnywhere: Hosting service.
The main deliverable of this project is the extracted-data.csv
file, containing data extracted through SQL queries. Additionally, if visualizations are opted for, a set of Python-generated visuals will be produced.
Youtube tutorial https://youtu.be/12gOkmMs8qg
To use this project, kindly follow these steps:
- Set up your environment variables. Save your
OPENAI_API_KEY
in .env - Connect to your SQL database. I've used
chinook.db
, a sample database, in this project. - Enter your database table and column names into
database-info.txt
in same format as example - Run
gpt-sql.py
to initiate the data extraction process. This script will convert your natural language input into SQL queries, execute them on the connected database, and save the results as a CSV calledextracted-data.csv
. - If you wish to visualize the data, run
gpt-python.py
. This script will create visualizations using the data extracted in the previous step.
Please note that you may need to install specific Python libraries if they aren't already present in your environment.
Standard Libraries:
os
,subprocess
,sqlite3
,csv
: Included with Python.
Third-Party Libraries:
dotenv
: Environment variables management.openai
: OpenAI API access.pandas
: Data analysis.
Install the third-party libraries using:
pip install python-dotenv openai pandas
Ensure you have the required libraries before running the project.
This project draws inspiration from a research paper I read. I have carried out the implementation and development of this idea into a functioning project. https://arxiv.org/abs/2305.15038