Connect to a Flask server and submit LLM Requests
You may need to install the cuda framework from
https://developer.nvidia.com/cuda-downloads?target_os=Windows
Create a virtual environment (optional but recommended):
This keeps your installations and project separate from your main Python installation.
On a command line go to where you want to create the python environment
python -m venv myenv
From the same command line run
myenv\Scripts\activate.bat
pip install transformers accelerate bitsandbytes
pip install Flask
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
navigate to the phi-py folder on the command line
python serve.py