/ds50-project

A prototype chatbot for recording health metrics for patients; project for the DS50 ("data science") class

Primary LanguagePythonBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

ds50-project

A prototype chatbot for recording health metrics for patients; project for the DS50 ("data science") class

Running the code

To run the code, you will need the following tools installed:

  • git
  • git-lfs (optional, used to download bigger files)
  • python >= 3.10 (earlier versions might work but haven't been tested)
  • pip
  • nodejs >= 14 (tested with nodejs == 20.2.0)
  • npm
  • python-pytorch (optional, but your system's package manager might offer a version of pytorch with AVX or CUDA optimizations, which you should prefer if your system supports them)

Downloading the code

git clone https://github.com/adri326/ds50-project
cd ds50-project

# Only needs to be run once per user, used to download bigger files
git lfs install

Backend

# Use your favorite python package management thing:
conda env create

# Install all the dependencies:
pip install -r requirements.txt

# Run the server (takes a few minutes on the first run as it needs to download the models from huggingface)
python src/server.py

Frontend

# Navigate to the frontend directory
cd chatbot-frontend

# Use yarn to pull all the dependencies (npm can work too, although it is a bit slower)
npx yarn

# Run the frontend application
npx yarn dev

Project organization

  • src/: the backend part of the application, implemented in Python
    • src/sentiment.py: entrypoint for the analysis of message sentiments/intents, to try and extract data from messages
    • src/sentiment_huggingface.py: an alternative implementation of sentiment analysis, using an off-the-shelf model from huggingface
    • src/chatgpt.py: calls the OpenAI API to query answers from GPT-3
    • src/similarity.py: computes a semantic vector for each question in the dataset and augments these questions using the acronym dictionary, and compares the user's question using cosinus similarity
    • src/acronym.py: loads a dictionary of acronyms
    • src/chatbot.py: entrypoint for the chatbot, which should answer to messages
    • src/server.py: entrypoint for a server to host an API for the chatbot
  • chatbot-frontend: the frontend part of the application, implemented in TypeScript using Solid.JS and Vite
    • chatbot-frontend/src/App.tsx: the entrypoint of the application
    • chatbot-frontend/src/api.ts: communicates with the API, by wrapping the data returned in typescript types
  • dataset: the datasets built or used
    • dataset/acro.json: the acronym dictionary
    • dataset/dataset_5Q.json: the dataset with augmented questions and french translations (with the acronyms correctly translated)