This is an AI app to find real-time discounts/deals/sales prices from various online markets around the world. The project exposes an HTTP REST endpoint to answer user queries about current sales like Amazon deals in a specific location or from the given any input file such as (CSV, Jsonlines, PDF, Markdown, Txt). It uses Pathway’s LLM App features to build real-time LLM(Large Language Model)-enabled data pipeline in Python and join data from multiple input sources, leverages OpenAI API Embeddings and Chat Completion endpoints to generate AI assistant responses.
Currently, the project supports two types of data sources and it is possible to extend sources by adding custom input connectors:
- Jsonlines - The Data source expects to have a
doc
object for each line. Make sure that you convert your input data first to Jsonlines. See a sample data in discounts.jsonl - Rainforest Product API.
- Retrieves the latest deals from various sources.
- Provides an API interface to explore these deals.
- Offers user-friendly UI with Streamlit.
- Filters and presents deals based on user queries or chosen data sources.
- Data and code reusability for offline evaluation. User has the option to choose to use local (cached) or real data.
- Extend data sources: Using Pathway's built-in connectors for JSONLines, CSV, Kafka, Redpanda, Debezium, streaming APIs, and more.
There are more things you can achieve and here are upcoming features:
- Incorporate additional data from external APIs, along with various files (such as Jsonlines, PDF, Doc, HTML, or Text format), databases like PostgreSQL or MySQL, and stream data from platforms like Kafka, Redpanda, or Debedizum.
- Merge data from these sources instantly.
- Convert any data to jsonlines.
- Maintain a data snapshot to observe variations in sales prices over time, as Pathway provides a built-in feature to compute differences between two alterations.
- Beyond making data accessible via API or UI, the LLM App allows you to relay processed data to other downstream connectors, such as BI and analytics tools. For instance, set it up to receive alerts upon detecting price shifts.
In case you use Rainforest API as a data source for the project, it provides real-time deals for Amazon products. When the user has the following query in the API request:
Can you find me discounts this week for Adidas men's shoes?
You will get the response with some discounts available in Amazon market:
As evident, ChatGPT interface offers general advice on locating discounts but lacks specificity regarding where or what type of discounts, among other details:
It requires only few lines of code to build a real-time AI-enabled data pipeline:
# Given a user question as a query from your API
query, response_writer = pw.io.http.rest_connector(
host=host,
port=port,
schema=QueryInputSchema,
autocommit_duration_ms=50,
)
# Real-time data coming from external data sources such as jsonlines file
sales_data = pw.io.jsonlines.read(
"./examples/data",
schema=DataInputSchema,
mode="streaming"
)
# Compute embeddings for each document using the OpenAI Embeddings API
embedded_data = embeddings(context=sales_data, data_to_embed=sales_data.doc)
# Construct an index on the generated embeddings in real-time
index = index_embeddings(embedded_data)
# Generate embeddings for the query from the OpenAI Embeddings API
embedded_query = embeddings(context=query, data_to_embed=pw.this.query)
# Build prompt using indexed data
responses = prompt(index, embedded_query, pw.this.query)
# Feed the prompt to ChatGPT and obtain the generated answer.
response_writer(responses)
# Run the pipeline
pw.run()
Open AI GPT excels at answering questions, but only on topics it remembers from its training data. If you want GPT to answer questions about unfamiliar topics such as:
- Recent events after Sep 2021.
- Your non-public documents.
- Information from past conversations.
- Real-time data.
- Including discount information.
The model might not answer such queries properly. Because it is not aware of the context or historical data or it needs additional details. In this case, you can use LLM App efficiently to give context to this search or answer process. See how LLM App works.
For example, a typical response you can get from the OpenAI Chat Completion endpoint or ChatGPT UI interface without context is:
User: Find discounts in the USA
Assistant: Sure! Here are some ways to find discounts
in the USA :\n\n1. Coupon Websites: Websites like RetailMeNot,
Coupons.com and Groupon offer a wide range of discounts
and coupon codes for various products and services.\n\n2.
As you can see, GPT responds only with suggestions on how to find discounts but it is not specific and does not provide exactly where or what kind of discount and so on.
To help the model, we give knowledge of discount information from any reliable data source (it can be JSON document, APIs, or data stream in Kafka) to get a more accurate answer. Assume that there is a discounts.csv
file with the following columns of data: discount_until, country, city, state, postal_code ,region, product_id, category, sub_category, brand, product_name, currency,actual_price ,discount_price, discount_percentage ,address.
After we give this knowledge to GPT using UI (applying a data source), look how it replies:
The app takes both Rainforest API and discounts.csv
file and indexed documents into account and uses this data when processing queries. The cool part is, the app is always aware of changes in the discounts. If you add another CSV file or data source, the LLM app does magic and automatically updates the AI model's response.
The sample project does the following procedures to achieve the above output:
- Prepare search data:
- Generate: discounts-data-generator.py simulates real-time data coming from external data sources and generates/updates existing
discounts.csv
file with random data. There is also cron job is running using Crontab and it runs every min to fetch latest data from Rainforest API. - Collect: You choose a data source or upload the CSV file through the UI file-uploader and it maps each row into a jsonline schema for better managing large data sets.
- Chunk: Documents are split into short, mostly self-contained sections to be embedded.
- Embed: Each section is embedded with the OpenAI API and retrieve the embedded result.
- Indexing: Constructs an index on the generated embeddings.
- Generate: discounts-data-generator.py simulates real-time data coming from external data sources and generates/updates existing
- Search (once per query)
- Given a user question, generate an embedding for the query from the OpenAI API.
- Using the embeddings, retrieve the vector index by relevance to the query
- Ask (once per query)
- Insert the question and the most relevant sections into a message to GPT
- Return GPT's answer
Example only supports Unix-like systems (such as Linux, macOS, BSD). If you are a Windows user, we highly recommend leveraging Windows Subsystem for Linux (WSL) or Dockerize the app to run as a container.
- Set environment variables
- From the project root folder, open your terminal and run
docker compose up
. - Navigate to
localhost:8501
on your browser when docker installion is successful.
- Make sure that Python 3.10 or above installed on your machine.
- Download and Install Pip to manage project packages.
- Create an OpenAI account and generate a new API Key: To access the OpenAI API, you will need to create an API Key. You can do this by logging into the OpenAI website and navigating to the API Key management page.
- (Optional): if you use Rainforest API as a data source, create an Rainforest account and get a new API Key. Refer to Rainforest API documentation.
Then, follow the easy steps to install and get started using the sample app.
This is done with the git clone
command followed by the URL of the repository:
git clone https://github.com/Boburmirzo/chatgpt-api-python-sales.git
Next, navigate to the project folder:
cd chatgpt-api-python-sales
Create .env
file in the root directory of the project, copy and paste the below config, and replace the {OPENAI_API_KEY}
configuration value with your key.
OPENAI_API_TOKEN={OPENAI_API_KEY}
HOST=0.0.0.0
PORT=8080
EMBEDDER_LOCATOR=text-embedding-ada-002
EMBEDDING_DIMENSION=1536
MODEL_LOCATOR=gpt-3.5-turbo
MAX_TOKENS=200
TEMPERATURE=0.0
Optionally, you change other values. By default, the app uses Mock API response to simulate the response from Rainforest API. If you need actual data, you need to specify also {RAINFOREST_BASE_URL}
and {RAINFOREST_API_KEY}
.
RAINFOREST_BASE_URL={RAINFOREST_BASE_URL}
RAINFOREST_API_KEY={RAINFOREST_API_KEY}
Install the required packages:
pip install --upgrade -r requirements.txt
Create a new virtual environment in the same folder and activate that environment:
python -m venv pw-env && source pw-env/bin/activate
You start the application by navigating to llm_app
folder and running main.py
:
python main.py
When the application runs successfully, you should see output something like this:
You can run the UI separately by navigating to cd examples/ui
and running Streamlit app
streamlit run app.py
command. It connects to the Discounts backend API automatically and you will see the UI frontend is running http://localhost:8501/ on a browser:
Assume that you choose CSV as a data source and we have this entry on the CSV file (this can be any CSV file where the first row has column names separated by commas):
discount_until | country | city | state | postal_code | region | product_id | category | sub_category | brand | product_name | currency | actual_price | discount_price | discount_percentage | address |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2024-08-09 | USA | Los Angeles | IL | 22658 | Central | 7849 | Footwear | Men Shoes | Nike | Formal Shoes | USD | 130.67 | 117.60 | 10 | 321 Oak St |
When the user uploads this file to the file uploader and asks questions:
Can you find me discounts this month for Nikes men shoes?
You will get the response as its expected on the UI.
"Based on the given data, there is one discount available this month for Nike's men shoes. Here are the details::
Discounts this week for Nike's men shoes:
City: Los Angeles
Ship Mode: Second Class
Postal Code: 22658
Category: Footwear
Sub-category: Men Shoes
Brand: Nike
Product Name: Formal Shoes
Formal Shoes
Actual Price: $130.67
Discounted Price: $117.60
Discount Percentage: 10%
Ship Date: 2024-08-09