Product Q&A ChatBot

Q&A tool to extract meaningful information from the Google Store reviews.

Installation

Prerequisite:

Steps:

Create and activate the virtualenv.

virtualenv .venv -p /usr/bin/python3.10
source .venv/bin/activate

Steps:

Load the environment variables that contain OPENAI and PINECONE credentials. Then, fill in your credentials. The example of envars were written in .env.example file.
Edit and fill .env.example and rename it to .env.
Load the envars with export command.
```
export $(grep -v '^#' .env | xargs)
```
Now we can run the execution pipeline by executing the main.py file.
```
python main.py
```
All the configuration params are saved in config/config.yml
The process will execute the other process in this order:
- Data ingestion (Automatically download the datasets).
  - The important params:
    - force_ingest: set to True to replace the current datasets.
- Upsert the document (Select the useful features (column), Load the CSV as Documents, Chunk the documents, Document embedding, and upsert to the Pinecone database).
  - The important params:
    - data_length: to determine the length of the datasets (number of rows as documents). With the value set to -1 are mean upsert all data to Pinecone.
    - force_upsert: set to True to replace the current documents in Pinecone.
- Evaluation (Evaluate the LLMs Performance), this part is already done in notebook/10_evaluation.ipynb but I still need time to implement it as part of the pipeline.

Steps:

Load the environment variables that contain OPENAI and PINECONE credentials. Then, fill in your credentials. The example of envars was written in .env.example file.
Edit and fill .env.example and rename it to .env.
Load the envars with export command.
```
export $(grep -v '^#' .env | xargs)
```
Now we can run the streamlit apps by executing the app.py.
```
streamlit run app.py
```

Convert notebook/10_evaluation.ipynb to pipeline.
Try more prompt by using Summarization and Self-Querying to improve the ChatBot. Because this is a CSV data and we can use the Metadata as input.
Spliting Upsert the document pipeline into more detail parts like:
- Select the useful features (column)
- Load the CSV as Documents
- Chunk the documents
- Document embedding
- Upsert to the Pinecone database

chat_bot_demo.mp4