cannot start docker! - neither 1 Gaudi card nor 8 Gaudi cards works

Question

cannot start docker! - neither 1 Gaudi card nor 8 Gaudi cards works

jingkang99 opened this issue 10 months ago · 9 comments

jingkang99 commented 10 months ago

System Info

checked out latest code and build the docker

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

I am jing.kang. checked out latest code

model=berkeley-nest/Starling-LM-7B-alpha doesn't work either

env

firstly the readme doc should have instruction to pass HF token as enviornment variable to docker, other wise get download error.
-e HF_TOKEN=hf_hyScBFJNVtSbUaJAJFIUaYSlHuXXXXXXXX

Cannot access gated repo for url https://huggingface.co/api/models/meta-llama/Llama-2-7b-hf.
Repo model meta-llama/Llama-2-7b-hf is gated. You must be authenticated to access it.

docker built without issue.

model=meta-llama/Llama-2-7b-hf
docker run -p 8080:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host tgi_gaudi --model-id $model

log

2024-03-01T23:42:28.174133Z  INFO text_generation_launcher: Args { model_id: "meta-llama/Llama-2-7b-hf", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: "20c35f6cf73d", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }
2024-03-01T23:42:28.174233Z  INFO download: text_generation_launcher: Starting download process.
2024-03-01T23:42:30.776578Z  INFO text_generation_launcher: Files are already present on the host. Skipping download.

2024-03-01T23:42:31.076992Z  INFO download: text_generation_launcher: Successfully downloaded weights.
2024-03-01T23:42:31.077385Z  INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-03-01T23:42:34.631525Z  INFO text_generation_launcher: CLI SHARDED = False DTYPE = bfloat16

2024-03-01T23:42:41.085750Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-03-01T23:42:42.086845Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00,  1.81it/s]
Traceback (most recent call last):

  File "/usr/local/bin/text-generation-server", line 8, in <module>
    sys.exit(app())

  File "/usr/local/lib/python3.10/dist-packages/text_generation_server/cli.py", line 120, in serve
    server.serve(model_id, revision, dtype, uds_path, sharded)

  File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 191, in serve
    asyncio.run(serve_inner(model_id, revision, dtype, sharded))

  File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)

  File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()

  File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 152, in serve_inner
    model = get_model(model_id, revision=revision, dtype=data_type)

  File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/__init__.py", line 33, in get_model
    return CausalLM(model_id, revision, dtype)

  File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 625, in __init__
    model = model.eval().to(device)

  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2595, in to
    return super().to(*args, **kwargs)

  File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/core/weight_sharing.py", line 173, in wrapped_to
    result = self.original_to(*args, **kwargs)

  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1163, in to
    return self._apply(convert)

  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)

  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)

  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 833, in _apply
    param_applied = fn(param)

  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1161, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)

  File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/core/weight_sharing.py", line 53, in __torch_function__
    return super().__torch_function__(func, types, new_args, kwargs)

RuntimeError: synStatus=20 [Device already acquired] Device acquire failed.
 rank=0
Error: ShardCannotStart
2024-03-01T23:42:42.184720Z ERROR text_generation_launcher: Shard 0 failed to start
2024-03-01T23:42:42.184734Z  INFO text_generation_launcher: Shutting down shards

Launch a local server instance on 8 Gaudi cards:

Expected behavior

start docker without error

Answer 1 · 2024-03-04T14:09:55.000Z

Hi @jingkang99, the error you observed usually means there is something wrong with your setup or a different process is using HPU resources. Are you able to run anything else on Gaudi in your env?

I see you are running two different configs: non-sharded and sharded. Could you provide entire output from both?

BTW. Information about HF token is already in README:
For gated models such as [StarCoder](https://huggingface.co/bigcode/starcoder), you will have to pass -e HUGGING_FACE_HUB_TOKEN=<token> to the docker run command above with a valid Hugging Face Hub read token.

Answer 2 · 2024-03-04T19:55:20.000Z

@kdamaszk Thanks for your help.

export volume=/opt/llm-models
export HUGGING_FACE_HUB_TOKEN=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
export model=meta-llama/Llama-2-7b-hf

docker run -p 8080:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e HUGGING_FACE_HUB_TOKEN=$HF_TOKEN --cap-add=sys_nice --ipc=host tgi_gaudi --model-id $model

cards are up

no special setting. still get the same error when try on 1 Gaudi card

root@node1:~/dev# docker run -p 8080:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e HUGGING_FACE_HUB_T                                                               OKEN=$HF_TOKEN --cap-add=sys_nice --ipc=host tgi_gaudi --model-id $model
2024-03-04T19:40:03.777037Z  INFO text_generation_launcher: Args { model_id: "meta-llama/Llama-2-7b-hf", revision: None, validation_workers: 2, sharded: None, num_shard: None                                                               , quantize: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max                                                               _total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: "f6de7e0fac8b", port: 80, shar                                                               d_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_ke                                                               rnels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, waterma                                                               rk_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }
2024-03-04T19:40:03.777192Z  INFO download: text_generation_launcher: Starting download process.
2024-03-04T19:40:07.515481Z  INFO text_generation_launcher: Files are already present on the host. Skipping download.

2024-03-04T19:40:07.985336Z  INFO download: text_generation_launcher: Successfully downloaded weights.
2024-03-04T19:40:07.985672Z  INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-03-04T19:40:12.058650Z  INFO text_generation_launcher: CLI SHARDED = False DTYPE = bfloat16

2024-03-04T19:40:17.995720Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-03-04T19:40:21.499561Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

Loading checkpoint shards: 100%|██████████| 2/2 [00:03<00:00,  1.75s/it]
Traceback (most recent call last):

  File "/usr/local/bin/text-generation-server", line 8, in <module>
    sys.exit(app())

  File "/usr/local/lib/python3.10/dist-packages/text_generation_server/cli.py", line 120, in serve
    server.serve(model_id, revision, dtype, uds_path, sharded)

  File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 191, in serve
    asyncio.run(serve_inner(model_id, revision, dtype, sharded))

  File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)

  File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()

  File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 152, in serve_inner
    model = get_model(model_id, revision=revision, dtype=data_type)

  File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/__init__.py", line 33, in get_model
    return CausalLM(model_id, revision, dtype)

  File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 625, in __init__
    model = model.eval().to(device)

  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2595, in to
    return super().to(*args, **kwargs)

  File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/core/weight_sharing.py", line 173, in wrapped_to
    result = self.original_to(*args, **kwargs)

  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1163, in to
    return self._apply(convert)

  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)

  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)

  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 833, in _apply
    param_applied = fn(param)

  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1161, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)

  File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/core/weight_sharing.py", line 53, in __torch_function__
    return super().__torch_function__(func, types, new_args, kwargs)

RuntimeError: synStatus=20 [Device already acquired] Device acquire failed.
 rank=0
2024-03-04T19:40:21.596390Z ERROR text_generation_launcher: Shard 0 failed to start
2024-03-04T19:40:21.596421Z  INFO text_generation_launcher: Shutting down shards
Error: ShardCannotStart

Answer 3 · 2024-03-04T21:30:03.000Z

I see this: RuntimeError: synStatus=20 [Device already acquired] Device acquire failed.

Can you try running a simple sample network like this:
https://github.com/HabanaAI/Model-References/tree/master/PyTorch/examples/computer_vision/hello_world
and check if you are able to get HPUs and run on it

Answer 4 · 2024-03-04T23:21:07.000Z

hello world gets different error. should be different root cause. docker encapsules all things, including libraries, right?

./habanalabs-installer.sh install --type pytorch --venv --verbose
ok

Validate Habanalabs PyTorch installation
================================================================================
============================= HABANA PT BRIDGE CONFIGURATION ===========================
 PT_HPU_LAZY_MODE = 1
 PT_RECIPE_CACHE_PATH =
 PT_CACHE_FOLDER_DELETE = 0
 PT_HPU_RECIPE_CACHE_CONFIG =
 PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
 PT_HPU_LAZY_ACC_PAR_MODE = 1
 PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
---------------------------: System Configuration :---------------------------
Num CPU Cores : 128
CPU RAM       : 1056454200 KB
------------------------------------------------------------------------------
Loading Habana modules from /opt/habana-pyt/lib/python3.10/site-packages/habana_frameworks/torch/lib
Habanalabs PyTorch test was completed successfully

(habana-pyt) root@node1:~/dev/Model-References/PyTorch/examples/computer_vision/hello_world# PT_HPU_LAZY_MODE=1 python mnist.py --batch-size=64 --epochs=1 --lr=1.0 --gamma=0.7 --hpu                Not using distributed mode
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ../data/MNIST/raw/train-images-idx3-ubyte.gz
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9912422/9912422 [00:00<00:00, 18100989.93it/s]
Extracting ../data/MNIST/raw/train-images-idx3-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ../data/MNIST/raw/train-labels-idx1-ubyte.gz
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28881/28881 [00:00<00:00, 130674966.37it/s]
Extracting ../data/MNIST/raw/train-labels-idx1-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw/t10k-images-idx3-ubyte.gz
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1648877/1648877 [00:00<00:00, 5279117.01it/s]
Extracting ../data/MNIST/raw/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4542/4542 [00:00<00:00, 22412386.79it/s]
Extracting ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw

============================= HABANA PT BRIDGE CONFIGURATION ===========================
 PT_HPU_LAZY_MODE = 1
 PT_RECIPE_CACHE_PATH =
 PT_CACHE_FOLDER_DELETE = 0
 PT_HPU_RECIPE_CACHE_CONFIG =
 PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
 PT_HPU_LAZY_ACC_PAR_MODE = 1
 PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
---------------------------: System Configuration :---------------------------
Num CPU Cores : 128
CPU RAM       : 1056454200 KB
------------------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/dev/Model-References/PyTorch/examples/computer_vision/hello_world/mnist.py", line 264, in <module>
    main()
  File "/root/dev/Model-References/PyTorch/examples/computer_vision/hello_world/mnist.py", line 253, in main
    train(args, model, device, train_loader,
  File "/root/dev/Model-References/PyTorch/examples/computer_vision/hello_world/mnist.py", line 76, in train
    100. * batch_idx / len(train_loader), loss.item()))
RuntimeError: synNodeCreateWithId failed for node: spatial_convolution with synStatus 26 [Generice failure]. .

mpirun -n 8 --bind-to core --map-by socket:PE=6 --rank-by core --report-bindings --allow-run-as-root -x PT_HPU_LAZY_MODE=1 python mnist.py --batch-size=64 --epochs=1 --lr=1.0 --gamma=0.7 --hpu

failed

PT_HPU_LAZY_MODE=1 python mnist.py --batch-size=64 --epochs=1 --lr=1.0 --gamma=0.7 --hpu --autocast
was running ok

8 HPU, 1 server in BF16 lazy mode:
mpirun -n 8 --bind-to core --map-by socket:PE=6 --rank-by core --report-bindings --allow-run-as-root -x PT_HPU_LAZY_MODE=1 python mnist.py --batch-size=64 --epochs=1 --lr=1.0 --gamma=0.7 --hpu --autocast

| distributed init (rank 0): env://
| distributed init (rank 4): env://
| distributed init (rank 6): env://
| distributed init (rank 7): env://
| distributed init (rank 5): env://
| distributed init (rank 2): env://
| distributed init (rank 3): env://
| distributed init (rank 1): env://
============================= HABANA PT BRIDGE CONFIGURATION ===========================
 PT_HPU_LAZY_MODE = 1
 PT_RECIPE_CACHE_PATH =
 PT_CACHE_FOLDER_DELETE = 0
 PT_HPU_RECIPE_CACHE_CONFIG =
 PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
 PT_HPU_LAZY_ACC_PAR_MODE = 1
 PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
---------------------------: System Configuration :---------------------------
Num CPU Cores : 128
CPU RAM       : 1056454200 KB
------------------------------------------------------------------------------
Train Epoch: 1 [0/7500.0 (0%)]  Loss: 2.328125
Train Epoch: 1 [640/7500.0 (9%)]        Loss: 1.132812
Train Epoch: 1 [1280/7500.0 (17%)]      Loss: 0.367188
Train Epoch: 1 [1920/7500.0 (26%)]      Loss: 0.376953
Train Epoch: 1 [2560/7500.0 (34%)]      Loss: 0.261719
Train Epoch: 1 [3200/7500.0 (43%)]      Loss: 0.154297
Train Epoch: 1 [3840/7500.0 (51%)]      Loss: 0.167969
Train Epoch: 1 [4480/7500.0 (60%)]      Loss: 0.088867
Train Epoch: 1 [5120/7500.0 (68%)]      Loss: 0.244141
Train Epoch: 1 [5760/7500.0 (77%)]      Loss: 0.101074
Train Epoch: 1 [6400/7500.0 (85%)]      Loss: 0.044922
Train Epoch: 1 [7040/7500.0 (94%)]      Loss: 0.062012

Total test set: 10000, number of workers: 8
* Average Acc 97.564 Average loss 0.073

was running ok

Answer 5 · 2024-03-04T23:24:04.000Z

is it tested with specific tag or release? maybe the latest source I used to build docker has some issues since you guys still working on it

https://github.com/huggingface/tgi-gaudi/tags

Answer 6 · 2024-03-05T19:49:17.000Z

If you try right now on the moving branch v1.2-release, it might not be compatible with docker 1.14. But you can try.

We will have a tag on the branch v1.2-release for docker 1.14 soon, so you can try with that

Answer 7 · 2024-03-14T17:39:30.000Z

@jingkang99 , also just want to make sure you have the same driver (check hl-smi) and docker release. The latest is 1.14

Answer 8 · 2024-03-19T07:47:34.000Z

@jingkang99 please use tag v1.2.1 from now: https://github.com/huggingface/tgi-gaudi/releases/tag/v1.2.1

Answer 9 · 2024-03-28T09:03:15.000Z

@jingkang99 is this issue still exists? Can we close this?