
Segmentation fault (core dumped)

The bug
Hi! I am getting the message Segmentation fault (core dumped) while running the following code.

To Reproduce

from guidance import models, gen

llama3 = models.Transformers("meta-llama/Meta-Llama-3-8B-Instruct", device_map = 'auto')
llama3 + f'Do you want a joke or a poem? ' + gen(stop='.')

System info (please complete the following information):
guidance version is 0.1.15

GPU info:

What version of LlamaCpp are you on? And does this happen if you run on the CPU rather than the GPU?

The version of llama_cpp_python is 0.2.77.
The same issue occurs when I use the CPU rather than the GPU.

And the prompt is fine when run directly from LlamaCpp (sorry, we see a lot of segfaults from LlamaCpp, and a segfault can't be from our code, which is 100% Python).