Run inference on the replit code instruct model using your CPU. This inference code uses a ggml quantized model. To run the model we'll use a library called ctransformers that has bindings to ggml in python.
Demo:
2023-06-27.14-46-07.mp4
Using docker should make all of this easier for you. Minimum specs, system with 8GB of ram. Recommend to use python 3.10.
Will post some numbers for these two later.
- AMD Epyc 7003 series CPU
- AMD Ryzen 5950x CPU
First create a venv.
python -m venv env && source env/bin/activateNext install the submodule with ctransformers patch.
git submodule update --init --recursiveNext install dependencies.
pip install -r requirements.txtNext download the quantized model weights (about 1.5GB).
python download_model.pyReady to rock, run inference.
python inference.pyNext modify inference script prompt and generation parameters.