/siads699

Primary LanguageJavaScript

Data Analysis from Generative NLP Algorithms


This project leverages new Large Language Models from OpenAI to analyze data by creating a SQL query from a user asking a question in natural language and that SQL query will then query the database to produce the final output.

The demo of this code is located here.

The Web App and postgres database are currently hosted on PythonAnywhere.com.

Read the full report here.

Getting Started

Clone the repo via git clone https://github.com/Wenjun-Mao/siads699.git

Install python package dependencies via pip install -r requirements.txt

The folder example3 contains the working code to run the web application.

These two files are used to replicate data the QA and visuals in the project:

01-Python-QA-File.py
02-Python-Visuals.ipynb

Data Access

Data supporting this research is available at https://archive.ics.uci.edu/ml/datasets/online+retail

Citation: Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

We did adjusted the columns for usability and readability. Specifically, we adjusted the dates so they were recent, renamed the countries to local cities in the US, and included hypothetical customer names instead of the 5 digit code.