cannot start docker! - neither 1 Gaudi card nor 8 Gaudi cards works
jingkang99 opened this issue · 9 comments
System Info
checked out latest code and build the docker
Information
- Docker
- The CLI directly
Tasks
- An officially supported command
- My own modifications
Reproduction
I am jing.kang. checked out latest code
model=berkeley-nest/Starling-LM-7B-alpha doesn't work either
firstly the readme doc should have instruction to pass HF token as enviornment variable to docker, other wise get download error.
-e HF_TOKEN=hf_hyScBFJNVtSbUaJAJFIUaYSlHuXXXXXXXX
Cannot access gated repo for url https://huggingface.co/api/models/meta-llama/Llama-2-7b-hf.
Repo model meta-llama/Llama-2-7b-hf is gated. You must be authenticated to access it.
model=meta-llama/Llama-2-7b-hf
docker run -p 8080:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host tgi_gaudi --model-id $model
log
2024-03-01T23:42:28.174133Z INFO text_generation_launcher: Args { model_id: "meta-llama/Llama-2-7b-hf", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: "20c35f6cf73d", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }
2024-03-01T23:42:28.174233Z INFO download: text_generation_launcher: Starting download process.
2024-03-01T23:42:30.776578Z INFO text_generation_launcher: Files are already present on the host. Skipping download.
2024-03-01T23:42:31.076992Z INFO download: text_generation_launcher: Successfully downloaded weights.
2024-03-01T23:42:31.077385Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-03-01T23:42:34.631525Z INFO text_generation_launcher: CLI SHARDED = False DTYPE = bfloat16
2024-03-01T23:42:41.085750Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-03-01T23:42:42.086845Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00, 1.81it/s]
Traceback (most recent call last):
File "/usr/local/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/cli.py", line 120, in serve
server.serve(model_id, revision, dtype, uds_path, sharded)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 191, in serve
asyncio.run(serve_inner(model_id, revision, dtype, sharded))
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 152, in serve_inner
model = get_model(model_id, revision=revision, dtype=data_type)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/__init__.py", line 33, in get_model
return CausalLM(model_id, revision, dtype)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 625, in __init__
model = model.eval().to(device)
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2595, in to
return super().to(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/core/weight_sharing.py", line 173, in wrapped_to
result = self.original_to(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1163, in to
return self._apply(convert)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 833, in _apply
param_applied = fn(param)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1161, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/core/weight_sharing.py", line 53, in __torch_function__
return super().__torch_function__(func, types, new_args, kwargs)
RuntimeError: synStatus=20 [Device already acquired] Device acquire failed.
rank=0
Error: ShardCannotStart
2024-03-01T23:42:42.184720Z ERROR text_generation_launcher: Shard 0 failed to start
2024-03-01T23:42:42.184734Z INFO text_generation_launcher: Shutting down shards
Launch a local server instance on 8 Gaudi cards:
Expected behavior
start docker without error
Hi @jingkang99, the error you observed usually means there is something wrong with your setup or a different process is using HPU resources. Are you able to run anything else on Gaudi in your env?
I see you are running two different configs: non-sharded and sharded. Could you provide entire output from both?
BTW. Information about HF token is already in README:
For gated models such as [StarCoder](https://huggingface.co/bigcode/starcoder), you will have to pass -e HUGGING_FACE_HUB_TOKEN=<token> to the docker run command above with a valid Hugging Face Hub read token.
@kdamaszk Thanks for your help.
export volume=/opt/llm-models
export HUGGING_FACE_HUB_TOKEN=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
export model=meta-llama/Llama-2-7b-hf
docker run -p 8080:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e HUGGING_FACE_HUB_TOKEN=$HF_TOKEN --cap-add=sys_nice --ipc=host tgi_gaudi --model-id $model
no special setting. still get the same error when try on 1 Gaudi card
root@node1:~/dev# docker run -p 8080:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e HUGGING_FACE_HUB_T OKEN=$HF_TOKEN --cap-add=sys_nice --ipc=host tgi_gaudi --model-id $model
2024-03-04T19:40:03.777037Z INFO text_generation_launcher: Args { model_id: "meta-llama/Llama-2-7b-hf", revision: None, validation_workers: 2, sharded: None, num_shard: None , quantize: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max _total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: "f6de7e0fac8b", port: 80, shar d_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_ke rnels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, waterma rk_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }
2024-03-04T19:40:03.777192Z INFO download: text_generation_launcher: Starting download process.
2024-03-04T19:40:07.515481Z INFO text_generation_launcher: Files are already present on the host. Skipping download.
2024-03-04T19:40:07.985336Z INFO download: text_generation_launcher: Successfully downloaded weights.
2024-03-04T19:40:07.985672Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-03-04T19:40:12.058650Z INFO text_generation_launcher: CLI SHARDED = False DTYPE = bfloat16
2024-03-04T19:40:17.995720Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-03-04T19:40:21.499561Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
Loading checkpoint shards: 100%|██████████| 2/2 [00:03<00:00, 1.75s/it]
Traceback (most recent call last):
File "/usr/local/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/cli.py", line 120, in serve
server.serve(model_id, revision, dtype, uds_path, sharded)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 191, in serve
asyncio.run(serve_inner(model_id, revision, dtype, sharded))
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 152, in serve_inner
model = get_model(model_id, revision=revision, dtype=data_type)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/__init__.py", line 33, in get_model
return CausalLM(model_id, revision, dtype)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 625, in __init__
model = model.eval().to(device)
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2595, in to
return super().to(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/core/weight_sharing.py", line 173, in wrapped_to
result = self.original_to(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1163, in to
return self._apply(convert)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 833, in _apply
param_applied = fn(param)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1161, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/core/weight_sharing.py", line 53, in __torch_function__
return super().__torch_function__(func, types, new_args, kwargs)
RuntimeError: synStatus=20 [Device already acquired] Device acquire failed.
rank=0
2024-03-04T19:40:21.596390Z ERROR text_generation_launcher: Shard 0 failed to start
2024-03-04T19:40:21.596421Z INFO text_generation_launcher: Shutting down shards
Error: ShardCannotStart
I see this: RuntimeError: synStatus=20 [Device already acquired] Device acquire failed.
Can you try running a simple sample network like this:
https://github.com/HabanaAI/Model-References/tree/master/PyTorch/examples/computer_vision/hello_world
and check if you are able to get HPUs and run on it
hello world gets different error. should be different root cause. docker encapsules all things, including libraries, right?
./habanalabs-installer.sh install --type pytorch --venv --verbose
ok
Validate Habanalabs PyTorch installation
================================================================================
============================= HABANA PT BRIDGE CONFIGURATION ===========================
PT_HPU_LAZY_MODE = 1
PT_RECIPE_CACHE_PATH =
PT_CACHE_FOLDER_DELETE = 0
PT_HPU_RECIPE_CACHE_CONFIG =
PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
PT_HPU_LAZY_ACC_PAR_MODE = 1
PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
---------------------------: System Configuration :---------------------------
Num CPU Cores : 128
CPU RAM : 1056454200 KB
------------------------------------------------------------------------------
Loading Habana modules from /opt/habana-pyt/lib/python3.10/site-packages/habana_frameworks/torch/lib
Habanalabs PyTorch test was completed successfully
(habana-pyt) root@node1:~/dev/Model-References/PyTorch/examples/computer_vision/hello_world# PT_HPU_LAZY_MODE=1 python mnist.py --batch-size=64 --epochs=1 --lr=1.0 --gamma=0.7 --hpu Not using distributed mode
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ../data/MNIST/raw/train-images-idx3-ubyte.gz
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9912422/9912422 [00:00<00:00, 18100989.93it/s]
Extracting ../data/MNIST/raw/train-images-idx3-ubyte.gz to ../data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ../data/MNIST/raw/train-labels-idx1-ubyte.gz
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28881/28881 [00:00<00:00, 130674966.37it/s]
Extracting ../data/MNIST/raw/train-labels-idx1-ubyte.gz to ../data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw/t10k-images-idx3-ubyte.gz
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1648877/1648877 [00:00<00:00, 5279117.01it/s]
Extracting ../data/MNIST/raw/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4542/4542 [00:00<00:00, 22412386.79it/s]
Extracting ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw
============================= HABANA PT BRIDGE CONFIGURATION ===========================
PT_HPU_LAZY_MODE = 1
PT_RECIPE_CACHE_PATH =
PT_CACHE_FOLDER_DELETE = 0
PT_HPU_RECIPE_CACHE_CONFIG =
PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
PT_HPU_LAZY_ACC_PAR_MODE = 1
PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
---------------------------: System Configuration :---------------------------
Num CPU Cores : 128
CPU RAM : 1056454200 KB
------------------------------------------------------------------------------
Traceback (most recent call last):
File "/root/dev/Model-References/PyTorch/examples/computer_vision/hello_world/mnist.py", line 264, in <module>
main()
File "/root/dev/Model-References/PyTorch/examples/computer_vision/hello_world/mnist.py", line 253, in main
train(args, model, device, train_loader,
File "/root/dev/Model-References/PyTorch/examples/computer_vision/hello_world/mnist.py", line 76, in train
100. * batch_idx / len(train_loader), loss.item()))
RuntimeError: synNodeCreateWithId failed for node: spatial_convolution with synStatus 26 [Generice failure]. .
mpirun -n 8 --bind-to core --map-by socket:PE=6 --rank-by core --report-bindings --allow-run-as-root -x PT_HPU_LAZY_MODE=1 python mnist.py --batch-size=64 --epochs=1 --lr=1.0 --gamma=0.7 --hpu
failed
PT_HPU_LAZY_MODE=1 python mnist.py --batch-size=64 --epochs=1 --lr=1.0 --gamma=0.7 --hpu --autocast
was running ok
8 HPU, 1 server in BF16 lazy mode:
mpirun -n 8 --bind-to core --map-by socket:PE=6 --rank-by core --report-bindings --allow-run-as-root -x PT_HPU_LAZY_MODE=1 python mnist.py --batch-size=64 --epochs=1 --lr=1.0 --gamma=0.7 --hpu --autocast
| distributed init (rank 0): env://
| distributed init (rank 4): env://
| distributed init (rank 6): env://
| distributed init (rank 7): env://
| distributed init (rank 5): env://
| distributed init (rank 2): env://
| distributed init (rank 3): env://
| distributed init (rank 1): env://
============================= HABANA PT BRIDGE CONFIGURATION ===========================
PT_HPU_LAZY_MODE = 1
PT_RECIPE_CACHE_PATH =
PT_CACHE_FOLDER_DELETE = 0
PT_HPU_RECIPE_CACHE_CONFIG =
PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
PT_HPU_LAZY_ACC_PAR_MODE = 1
PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
---------------------------: System Configuration :---------------------------
Num CPU Cores : 128
CPU RAM : 1056454200 KB
------------------------------------------------------------------------------
Train Epoch: 1 [0/7500.0 (0%)] Loss: 2.328125
Train Epoch: 1 [640/7500.0 (9%)] Loss: 1.132812
Train Epoch: 1 [1280/7500.0 (17%)] Loss: 0.367188
Train Epoch: 1 [1920/7500.0 (26%)] Loss: 0.376953
Train Epoch: 1 [2560/7500.0 (34%)] Loss: 0.261719
Train Epoch: 1 [3200/7500.0 (43%)] Loss: 0.154297
Train Epoch: 1 [3840/7500.0 (51%)] Loss: 0.167969
Train Epoch: 1 [4480/7500.0 (60%)] Loss: 0.088867
Train Epoch: 1 [5120/7500.0 (68%)] Loss: 0.244141
Train Epoch: 1 [5760/7500.0 (77%)] Loss: 0.101074
Train Epoch: 1 [6400/7500.0 (85%)] Loss: 0.044922
Train Epoch: 1 [7040/7500.0 (94%)] Loss: 0.062012
Total test set: 10000, number of workers: 8
* Average Acc 97.564 Average loss 0.073
was running ok
is it tested with specific tag or release? maybe the latest source I used to build docker has some issues since you guys still working on it
If you try right now on the moving branch v1.2-release, it might not be compatible with docker 1.14. But you can try.
We will have a tag on the branch v1.2-release for docker 1.14 soon, so you can try with that
@jingkang99 , also just want to make sure you have the same driver (check hl-smi) and docker release. The latest is 1.14
@jingkang99 please use tag v1.2.1 from now: https://github.com/huggingface/tgi-gaudi/releases/tag/v1.2.1
@jingkang99 is this issue still exists? Can we close this?