update the base image from 1.14 to 1.15
yafshar opened this issue · 4 comments
yafshar commented
Currently the base image is from,
vault.habana.ai/gaudi-docker/1.14.0/ubuntu22.04/habanalabs/pytorch-installer-2.1.1:latest
to support the new model Mixtral-8x7B and other variants there is a need to upgrade to 1.15.
Only upgrading the components of optimum-habana -> 1.11.0
& transformers -> 4.38.2
causes other issues
2024-04-10T21:55:13.283860Z DEBUG text_generation_launcher: File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 68, in Warmup
2024-04-10T21:55:13.283861Z DEBUG text_generation_launcher: self.model.warmup(batches)
2024-04-10T21:55:13.283862Z DEBUG text_generation_launcher: File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 1080, in warmup
2024-04-10T21:55:13.283863Z DEBUG text_generation_launcher: _, prefill_batch = self.generate_token([batches.pop(0)])
2024-04-10T21:55:13.283864Z DEBUG text_generation_launcher: File "/usr/lib/python3.10/contextlib.py", line 79, in inner
2024-04-10T21:55:13.283865Z DEBUG text_generation_launcher: return func(*args, **kwds)
2024-04-10T21:55:13.283866Z DEBUG text_generation_launcher: File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 918, in generate_token
2024-04-10T21:55:13.283868Z DEBUG text_generation_launcher: batch.logits, batch.past = self.forward(
2024-04-10T21:55:13.283869Z DEBUG text_generation_launcher: File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 818, in forward
2024-04-10T21:55:13.283870Z DEBUG text_generation_launcher: outputs = self.model.forward(**kwargs)
2024-04-10T21:55:13.283872Z DEBUG text_generation_launcher: File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/graphs.py", line 661, in forward
2024-04-10T21:55:13.283873Z DEBUG text_generation_launcher: return wrapped_hpugraph_forward(cache, stream, orig_fwd, args, kwargs, disable_tensor_cache, asynchronous, dry_run, max_graphs)
2024-04-10T21:55:13.283874Z DEBUG text_generation_launcher: File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/graphs.py", line 585, in wrapped_hpugraph_forward
2024-04-10T21:55:13.283875Z DEBUG text_generation_launcher: cached.graph.replayV3(input_tensor_list, cached.asynchronous)
2024-04-10T21:55:13.283876Z DEBUG text_generation_launcher: File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/graphs.py", line 71, in replayV3
2024-04-10T21:55:13.283877Z DEBUG text_generation_launcher: _hpu_C.replayV3(self.hpu_graph, tlistI, asynchronous)
2024-04-10T21:55:13.283878Z DEBUG text_generation_launcher: RuntimeError: [Rank:0] FATAL ERROR :: MODULE:PT_BRIDGE Exception in Launch thread...
2024-04-10T21:55:13.283880Z DEBUG text_generation_launcher: Check $HABANA_LOGS/ for details[Rank:0] FATAL ERROR :: MODULE:PT_LAZY Error, ValidateSyncInputTensors tensor_data is empty. Tensorid:41707 QueueStatus:ThreadPool m_tasks size: 1 irValue:id_110063_hpu__input
2024-04-10T21:55:13.283883Z DEBUG text_generation_launcher: [Rank:0] Habana exception raised from ValidateSyncInputTensors at hpu_lazy_tensors.cpp:875
- @regisss or others any help or hint here! (to enable supporting Mixtral-8x7B)
- Is there any plan or other PR for upgrading to 1.15.0?
- I am working on an upgrade but I might need some help
yafshar commented
@kdamaszk can you please share the work. I can also provide help to do extra test or add any missing feature. I rather do not open a PR myself and discard it later
yafshar commented