How to do inference with 7b

Question

How to do inference with 7b

fahdmirza opened this issue 6 months ago · 10 comments

Hello,
I have downloaded the 7B model on my ubuntu. could someone advise, how can I do simple text inference like asking 'hello'? Which script or code to run where? I have 1 GPU card of 48GB VRAM. Thanks.

Answer 1 · 2024-06-18T22:20:16.000Z

chameleon) Ubuntu@0068-kci-prxmx10127:~/chameleon/chameleon$ python inference/chameleon.py

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.0 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last): File "/home/Ubuntu/chameleon/chameleon/inference/chameleon.py", line 18, in
import torch
File "/home/Ubuntu/miniconda3/envs/chameleon/lib/python3.11/site-packages/torch/init.py", line 1382, in
from .functional import * # noqa: F403
File "/home/Ubuntu/miniconda3/envs/chameleon/lib/python3.11/site-packages/torch/functional.py", line 7, in
import torch.nn.functional as F
File "/home/Ubuntu/miniconda3/envs/chameleon/lib/python3.11/site-packages/torch/nn/init.py", line 1, in
from .modules import * # noqa: F403
File "/home/Ubuntu/miniconda3/envs/chameleon/lib/python3.11/site-packages/torch/nn/modules/init.py", line 35, in
from .transformer import TransformerEncoder, TransformerDecoder,
File "/home/Ubuntu/miniconda3/envs/chameleon/lib/python3.11/site-packages/torch/nn/modules/transformer.py", line 20, in
device: torch.device = torch.device(torch._C._get_default_device()), # torch.device('cpu'),
/home/Ubuntu/miniconda3/envs/chameleon/lib/python3.11/site-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
device: torch.device = torch.device(torch._C._get_default_device()), # torch.device('cpu'),
Traceback (most recent call last):
File "/home/Ubuntu/chameleon/chameleon/inference/chameleon.py", line 31, in
from chameleon.inference import loader
File "/home/Ubuntu/chameleon/chameleon/inference/chameleon.py", line 31, in
from chameleon.inference import loader
ModuleNotFoundError: No module named 'chameleon.inference'; 'chameleon' is not a package
(chameleon) Ubuntu@0068-kci-prxmx10127:~/chameleon/chameleon$

Answer 2 · 2024-06-18T22:40:12.000Z

You can run it via the (mini)viewer ui or you can take a look at the code examples
https://github.com/facebookresearch/chameleon/blob/main/chameleon/inference/examples/
There is no cli yet

Answer 3 · 2024-06-18T22:47:38.000Z

Tried miniui and also tried running example code but its not working. can you give some step by step example please?

Answer 4 · 2024-06-18T22:50:54.000Z

You can run it via the (mini)viewer ui or you can take a look at the code examples https://github.com/facebookresearch/chameleon/blob/main/chameleon/inference/examples/ There is no cli yet

Thx for the quick response. Got a numpy problem running inference example and miniviewer.
python chemeleon/inference/examples/simple.py would show the error message. The text completion can still finish.

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.0 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

However, the same error will show for miniviewer. Again, pure text works but when you add image in the conversation, the numpy problem would throw a runtime error.

  File "/home/lukaemon/Dev/vlm-dojo/kami/chameleon/chameleon/inference/chameleon.py", line 133, in tokenize_image
    self.image_tokenizer.img_tokens_from_pil(img)
  File "/home/lukaemon/Dev/vlm-dojo/kami/chameleon/chameleon/inference/image_tokenizer.py", line 90, in img_tokens_from_pil
    vqgan_input = self._vqgan_input_from(image).to(self._device).to(self._dtype)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lukaemon/Dev/vlm-dojo/kami/chameleon/chameleon/inference/image_tokenizer.py", line 82, in _vqgan_input_from
    torch.from_numpy(np_img).permute(2, 0, 1).float()
    ^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Numpy is not available

These are the problems I'm facing running inference example code and miniviewer.
The runtime environment is setup per project README.

Answer 5 · 2024-06-18T22:58:53.000Z

@fahdmirza
Here is a step-by-step guide (most is from the README):

conda create -n chameleon python=3.11
conda activate chameleon
python -m pip install git+https://github.com/facebookresearch/chameleon.git
python -m chameleon.download_data [pre-signed URL]
python -m chameleon.miniviewer

then open you browser to http://localhost:5000

@lukaemon Can you share details about your environment? Maybe the [conda env export]. I'm curious about the versions of numpy, torch, and xformers.

Answer 6 · 2024-06-18T22:59:28.000Z

@lukaemon I only tried an example yet, but it wasn't working with numpy 2.0.0, which is now the default, stable numpy.

You need to downgrade to 1.26.4 (currently the latest in the 1.x branch) with pip install numpy==1.26.4.

Answer 7 · 2024-06-18T23:01:41.000Z

@mimrock Thanks for finding this fix. If @lukaemon confirms it fixes their problem, I'll hard code that version in the pyproject.toml

Answer 8 · 2024-06-18T23:05:47.000Z

@lukaemon I only tried an example yet, but it wasn't working with numpy 2.0.0, which is now the default, stable numpy.

You need to downgrade to 1.26.4 (currently the latest in the 1.x branch) with pip install numpy==1.26.4.

@lshamis 1.26.4 works for both example scripts and miniviewer. Thx @mimrock!

Answer 9 · 2024-06-18T23:24:05.000Z

@fahdmirza Here is a step-by-step guide (most is from the README):
conda create -n chameleon python=3.11
conda activate chameleon
python -m pip install git+https://github.com/facebookresearch/chameleon.git
python -m chameleon.download_data [pre-signed URL]
python -m chameleon.miniviewer
then open you browser to http://localhost:5000

Following above steps and got following :

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 172.00 MiB. GPU 0 has a total capacty of 47.44 GiB of which 1.62 MiB is free. Process 905239 has 36.49 GiB memory in use. Process 1025525 has 564.00 MiB memory in use. Including non-PyTorch memory, this process has 10.35 GiB memory in use. Of the allocated memory 10.05 GiB is allocated by PyTorch, and 1.56 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

So it means that my 48GB VRAM GPU is not enough for this model? if yes I am gutted as was looking so forward to run this .

Thanks for help.

Answer 10 · 2024-06-18T23:56:03.000Z

D:\GitHub\chameleon>python -m chameleon.miniviewer
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
Traceback (most recent call last):
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\GitHub\chameleon\chameleon\miniviewer\__main__.py", line 9, in <module>
    main()
  File "D:\GitHub\chameleon\cham\lib\site-packages\click\core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "D:\GitHub\chameleon\cham\lib\site-packages\click\core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "D:\GitHub\chameleon\cham\lib\site-packages\click\core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "D:\GitHub\chameleon\cham\lib\site-packages\click\core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "D:\GitHub\chameleon\chameleon\miniviewer\miniviewer.py", line 241, in main
    cm3v2_inference_model = ChameleonInferenceModel(
  File "D:\GitHub\chameleon\chameleon\inference\chameleon.py", line 544, in __init__
    self.token_manager = TokenManager(
  File "D:\GitHub\chameleon\chameleon\inference\chameleon.py", line 99, in __init__
    self.vocab = VocabInfo(json.load(open(tokenizer_path))["model"]["vocab"])
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\json\__init__.py", line 293, in load
    return loads(fp.read(),
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 3565933: character maps to <undefined>
Exception ignored in: <function ChameleonInferenceModel.__del__ at 0x00000163593575B0>
Traceback (most recent call last):
  File "D:\GitHub\chameleon\chameleon\inference\chameleon.py", line 572, in __del__
    with self.dctx.active_key_lock:
AttributeError: 'ChameleonInferenceModel' object has no attribute 'dctx'