This is a quick repo that uses Hugging Face in order to spin up a local instance of Stanford's Alpaca.
Currently, the model works great, but the inference times are painfully slow (especially on my 2018 Macbook Pro :(). Currently exploring quantization in 4bits and 8bits as a way to improve this