Issue with Loading MPT-7B on Titan-X GPU - Potential Device Map Solution
danielvasic opened this issue · 7 comments
Hey @lhenault ,
First off, kudos on the Python package - it's been a real game-changer for my projects.
I've hit a bit of a snag though. I'm trying to load the MPT-7B model onto my Titan-X GPU, but it seems to be loading into RAM instead. Weirdly enough, I managed to get the Alpaca Lora 7B model loaded up, but it only works on the edit route. Any other route and I get an 'index out of bounds' exception. I'm guessing it's because Alpaca only supports the Instruct API and not the Chat or Completion APIs. So, I tried switching to MPT-7B.Chat, but no dice - can't get the model into memory.
I came across a StackOverflow discussion that suggested using a device map when loading the model. Sounds like it might do the trick, but I wanted to run it by you first.
Do you think this could solve my problem? Any advice would be super appreciated!
Best,
Hey @danielvasic, thanks for the kind words, I'm glad this project is useful to you!
From what I've seen, the MPT-7B model is using about 16GB of VRAM on the GPU (+ a few extra ones for the inputs), so a Titan X wouldn't be enough (correct me if I'm wrong but these "only" have 12GB).
In your original comment you mention having 2 Titan X available, so using them together through device_map="auto"
should give you 24GB of VRAM, enough for you to load and use that model, if that works as suggested by your link. I'd give it a try if I were you, let me know how it goes.
BTW, current file models.py
L80 loads the model on the first available GPU:
).to(device)
I believe you should comment and / or remove this part to use all of them.
Dear @lhenault ,
Thanks for Your reply and Your valuable time,
Actually I have two GPU-s, one is Titan X (with 12GB VRAM as you suggested) and the other is Quadro K2200 (with 4GB VRAM) that would be just enough to load the model, I have tried the device_map="auto"
option and it seems like it is not supported for MPT model yet :-(
File "/home/user/.local/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2685, in from_pretrained
raise ValueError(f"{model.__class__.__name__} does not support `device_map='{device_map}'` yet.")
ValueError: MPTForCausalLM does not support `device_map='auto'` yet.
On another note, how can I use llama-7B-lora with openai Python API, i have tried this example but I get 500 server error:
import openai
# Put anything you want in `API key`
openai.api_key = 'Free the models'
# Point to your own url
openai.api_base = "http://127.0.0.1:8080"
# Do your usual things, for instance a completion query:
print(openai.Model.list())
completion = openai.Completion.create(model="llama-7B-lora", prompt="Hello everyone this is")
So for the Alpaca model you're mentioning, it is an instruct model, you should rather use Edits
, not Completions
with this one, make sure that the model is correctly defined in your models.toml
file, and that you are using the correct name.
Dear @lhenault,
Certanly not a question for here but can I get some instructions on how to use Edits with openai Python API. The model is loaded and working fine with cURL request, and I get the response but I cannot find openai.Edit
just openai.Completion
and openai.ChatCompletion
?
I have to admit I haven't tried this one through the official client, and examples are definitely lacking, but there is a Edit
interface in it.
Thanks @lhenault,
Thanks very much I tried edits
not edit, also on the side note If anyone wants to use ChatCompletion API the OpenAssistant example described in Your blog post works fine. I only had to define the offload_folder
parameter and create the directory for it.