How to do inference with 7b
fahdmirza opened this issue · 10 comments
Hello,
I have downloaded the 7B model on my ubuntu. could someone advise, how can I do simple text inference like asking 'hello'? Which script or code to run where? I have 1 GPU card of 48GB VRAM. Thanks.
chameleon) Ubuntu@0068-kci-prxmx10127:~/chameleon/chameleon$ python inference/chameleon.py
A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.0 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.
If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.
Traceback (most recent call last): File "/home/Ubuntu/chameleon/chameleon/inference/chameleon.py", line 18, in
import torch
File "/home/Ubuntu/miniconda3/envs/chameleon/lib/python3.11/site-packages/torch/init.py", line 1382, in
from .functional import * # noqa: F403
File "/home/Ubuntu/miniconda3/envs/chameleon/lib/python3.11/site-packages/torch/functional.py", line 7, in
import torch.nn.functional as F
File "/home/Ubuntu/miniconda3/envs/chameleon/lib/python3.11/site-packages/torch/nn/init.py", line 1, in
from .modules import * # noqa: F403
File "/home/Ubuntu/miniconda3/envs/chameleon/lib/python3.11/site-packages/torch/nn/modules/init.py", line 35, in
from .transformer import TransformerEncoder, TransformerDecoder,
File "/home/Ubuntu/miniconda3/envs/chameleon/lib/python3.11/site-packages/torch/nn/modules/transformer.py", line 20, in
device: torch.device = torch.device(torch._C._get_default_device()), # torch.device('cpu'),
/home/Ubuntu/miniconda3/envs/chameleon/lib/python3.11/site-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
device: torch.device = torch.device(torch._C._get_default_device()), # torch.device('cpu'),
Traceback (most recent call last):
File "/home/Ubuntu/chameleon/chameleon/inference/chameleon.py", line 31, in
from chameleon.inference import loader
File "/home/Ubuntu/chameleon/chameleon/inference/chameleon.py", line 31, in
from chameleon.inference import loader
ModuleNotFoundError: No module named 'chameleon.inference'; 'chameleon' is not a package
(chameleon) Ubuntu@0068-kci-prxmx10127:~/chameleon/chameleon$
You can run it via the (mini)viewer ui or you can take a look at the code examples
https://github.com/facebookresearch/chameleon/blob/main/chameleon/inference/examples/
There is no cli yet
Tried miniui and also tried running example code but its not working. can you give some step by step example please?
You can run it via the (mini)viewer ui or you can take a look at the code examples https://github.com/facebookresearch/chameleon/blob/main/chameleon/inference/examples/ There is no cli yet
Thx for the quick response. Got a numpy problem running inference example and miniviewer.
python chemeleon/inference/examples/simple.py
would show the error message. The text completion can still finish.
A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.0 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.
If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.
However, the same error will show for miniviewer. Again, pure text works but when you add image in the conversation, the numpy problem would throw a runtime error.
File "/home/lukaemon/Dev/vlm-dojo/kami/chameleon/chameleon/inference/chameleon.py", line 133, in tokenize_image
self.image_tokenizer.img_tokens_from_pil(img)
File "/home/lukaemon/Dev/vlm-dojo/kami/chameleon/chameleon/inference/image_tokenizer.py", line 90, in img_tokens_from_pil
vqgan_input = self._vqgan_input_from(image).to(self._device).to(self._dtype)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lukaemon/Dev/vlm-dojo/kami/chameleon/chameleon/inference/image_tokenizer.py", line 82, in _vqgan_input_from
torch.from_numpy(np_img).permute(2, 0, 1).float()
^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Numpy is not available
These are the problems I'm facing running inference example code and miniviewer.
The runtime environment is setup per project README.
@fahdmirza
Here is a step-by-step guide (most is from the README):
conda create -n chameleon python=3.11
conda activate chameleon
python -m pip install git+https://github.com/facebookresearch/chameleon.git
python -m chameleon.download_data [pre-signed URL]
python -m chameleon.miniviewer
then open you browser to http://localhost:5000
@lukaemon Can you share details about your environment? Maybe the [conda env export]. I'm curious about the versions of numpy, torch, and xformers.
@lukaemon I only tried an example yet, but it wasn't working with numpy 2.0.0, which is now the default, stable numpy.
You need to downgrade to 1.26.4 (currently the latest in the 1.x branch) with pip install numpy==1.26.4
.
@lukaemon I only tried an example yet, but it wasn't working with numpy 2.0.0, which is now the default, stable numpy.
You need to downgrade to 1.26.4 (currently the latest in the 1.x branch) with
pip install numpy==1.26.4
.
@lshamis 1.26.4 works for both example scripts and miniviewer. Thx @mimrock!
@fahdmirza Here is a step-by-step guide (most is from the README):
conda create -n chameleon python=3.11 conda activate chameleon python -m pip install git+https://github.com/facebookresearch/chameleon.git python -m chameleon.download_data [pre-signed URL] python -m chameleon.miniviewer
then open you browser to http://localhost:5000
Following above steps and got following :
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 172.00 MiB. GPU 0 has a total capacty of 47.44 GiB of which 1.62 MiB is free. Process 905239 has 36.49 GiB memory in use. Process 1025525 has 564.00 MiB memory in use. Including non-PyTorch memory, this process has 10.35 GiB memory in use. Of the allocated memory 10.05 GiB is allocated by PyTorch, and 1.56 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
So it means that my 48GB VRAM GPU is not enough for this model? if yes I am gutted as was looking so forward to run this .
Thanks for help.
D:\GitHub\chameleon>python -m chameleon.miniviewer
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
Traceback (most recent call last):
File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "D:\GitHub\chameleon\chameleon\miniviewer\__main__.py", line 9, in <module>
main()
File "D:\GitHub\chameleon\cham\lib\site-packages\click\core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "D:\GitHub\chameleon\cham\lib\site-packages\click\core.py", line 1078, in main
rv = self.invoke(ctx)
File "D:\GitHub\chameleon\cham\lib\site-packages\click\core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "D:\GitHub\chameleon\cham\lib\site-packages\click\core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "D:\GitHub\chameleon\chameleon\miniviewer\miniviewer.py", line 241, in main
cm3v2_inference_model = ChameleonInferenceModel(
File "D:\GitHub\chameleon\chameleon\inference\chameleon.py", line 544, in __init__
self.token_manager = TokenManager(
File "D:\GitHub\chameleon\chameleon\inference\chameleon.py", line 99, in __init__
self.vocab = VocabInfo(json.load(open(tokenizer_path))["model"]["vocab"])
File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\json\__init__.py", line 293, in load
return loads(fp.read(),
File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 3565933: character maps to <undefined>
Exception ignored in: <function ChameleonInferenceModel.__del__ at 0x00000163593575B0>
Traceback (most recent call last):
File "D:\GitHub\chameleon\chameleon\inference\chameleon.py", line 572, in __del__
with self.dctx.active_key_lock:
AttributeError: 'ChameleonInferenceModel' object has no attribute 'dctx'