RTX Chat WebUI Data Refresher

This project is designed to continuously refresh the dataset used by the NVIDIA RTX Chat WebUI application. It automatically scrapes visible text from a predefined list of websites and updates the dataset with the new information.

Installation

Prerequisites

Python 3.6 or later
Google Chrome browser installed

Setup

Clone this repository or download the source code.
Install the required Python packages by running:

pip install selenium schedule

Usage

Open the Scrape.py file and modify the websites and file_names lists to include the URLs and file names you want to scrape and save, respectively.
Run the app_launch.bat file. This will set up the required environment, verify the installation, and start the scraping and refreshing processes.

The script will initially scrape the visible text from the specified websites and save it to individual text files in the AppData\Local\NVIDIA\ChatWithRTX\RAG\trt-llm-rag-windows-main\dataset directory. After the initial scrape, the script will continue to refresh the dataset every hour by scraping the websites again and updating the corresponding text files.

File Structure