I basically Dockerized this Hugginface model into a serverless Runpod API. You can find more info on this project in Twitter
Docker image: (kopyl/bakllava)
In order to build you'd need to ackquire a model cache (/root/.cache/huggingface). You can do it by running this Docker image once and then copying the cache from the container to your host machine. Then you can build the image with the cache.