huggingface/tgi-gaudi

cannot start docker! - neither 1 Gaudi card nor 8 Gaudi cards works

jingkang99 opened this issue · 9 comments

System Info

checked out latest code and build the docker

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

I am jing.kang. checked out latest code

model=berkeley-nest/Starling-LM-7B-alpha doesn't work either

env
image

firstly the readme doc should have instruction to pass HF token as enviornment variable to docker, other wise get download error.
-e HF_TOKEN=hf_hyScBFJNVtSbUaJAJFIUaYSlHuXXXXXXXX

Cannot access gated repo for url https://huggingface.co/api/models/meta-llama/Llama-2-7b-hf.
Repo model meta-llama/Llama-2-7b-hf is gated. You must be authenticated to access it.

docker built without issue.
image

image

model=meta-llama/Llama-2-7b-hf
docker run -p 8080:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host tgi_gaudi --model-id $model

log

2024-03-01T23:42:28.174133Z  INFO text_generation_launcher: Args { model_id: "meta-llama/Llama-2-7b-hf", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: "20c35f6cf73d", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }
2024-03-01T23:42:28.174233Z  INFO download: text_generation_launcher: Starting download process.
2024-03-01T23:42:30.776578Z  INFO text_generation_launcher: Files are already present on the host. Skipping download.

2024-03-01T23:42:31.076992Z  INFO download: text_generation_launcher: Successfully downloaded weights.
2024-03-01T23:42:31.077385Z  INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-03-01T23:42:34.631525Z  INFO text_generation_launcher: CLI SHARDED = False DTYPE = bfloat16

2024-03-01T23:42:41.085750Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-03-01T23:42:42.086845Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00,  1.81it/s]
Traceback (most recent call last):

  File "/usr/local/bin/text-generation-server", line 8, in <module>
    sys.exit(app())

  File "/usr/local/lib/python3.10/dist-packages/text_generation_server/cli.py", line 120, in serve
    server.serve(model_id, revision, dtype, uds_path, sharded)

  File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 191, in serve
    asyncio.run(serve_inner(model_id, revision, dtype, sharded))

  File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)

  File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()

  File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 152, in serve_inner
    model = get_model(model_id, revision=revision, dtype=data_type)

  File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/__init__.py", line 33, in get_model
    return CausalLM(model_id, revision, dtype)

  File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 625, in __init__
    model = model.eval().to(device)

  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2595, in to
    return super().to(*args, **kwargs)

  File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/core/weight_sharing.py", line 173, in wrapped_to
    result = self.original_to(*args, **kwargs)

  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1163, in to
    return self._apply(convert)

  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)

  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)

  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 833, in _apply
    param_applied = fn(param)

  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1161, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)

  File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/core/weight_sharing.py", line 53, in __torch_function__
    return super().__torch_function__(func, types, new_args, kwargs)

RuntimeError: synStatus=20 [Device already acquired] Device acquire failed.
 rank=0
Error: ShardCannotStart
2024-03-01T23:42:42.184720Z ERROR text_generation_launcher: Shard 0 failed to start
2024-03-01T23:42:42.184734Z  INFO text_generation_launcher: Shutting down shards

Launch a local server instance on 8 Gaudi cards:

image

Expected behavior

start docker without error

Hi @jingkang99, the error you observed usually means there is something wrong with your setup or a different process is using HPU resources. Are you able to run anything else on Gaudi in your env?

I see you are running two different configs: non-sharded and sharded. Could you provide entire output from both?

BTW. Information about HF token is already in README:
For gated models such as [StarCoder](https://huggingface.co/bigcode/starcoder), you will have to pass -e HUGGING_FACE_HUB_TOKEN=<token> to the docker run command above with a valid Hugging Face Hub read token.

@kdamaszk Thanks for your help.

export volume=/opt/llm-models
export HUGGING_FACE_HUB_TOKEN=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
export model=meta-llama/Llama-2-7b-hf

docker run -p 8080:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e HUGGING_FACE_HUB_TOKEN=$HF_TOKEN --cap-add=sys_nice --ipc=host tgi_gaudi --model-id $model

cards are up
image

no special setting. still get the same error when try on 1 Gaudi card

root@node1:~/dev# docker run -p 8080:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e HUGGING_FACE_HUB_T                                                               OKEN=$HF_TOKEN --cap-add=sys_nice --ipc=host tgi_gaudi --model-id $model
2024-03-04T19:40:03.777037Z  INFO text_generation_launcher: Args { model_id: "meta-llama/Llama-2-7b-hf", revision: None, validation_workers: 2, sharded: None, num_shard: None                                                               , quantize: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max                                                               _total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: "f6de7e0fac8b", port: 80, shar                                                               d_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_ke                                                               rnels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, waterma                                                               rk_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }
2024-03-04T19:40:03.777192Z  INFO download: text_generation_launcher: Starting download process.
2024-03-04T19:40:07.515481Z  INFO text_generation_launcher: Files are already present on the host. Skipping download.

2024-03-04T19:40:07.985336Z  INFO download: text_generation_launcher: Successfully downloaded weights.
2024-03-04T19:40:07.985672Z  INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-03-04T19:40:12.058650Z  INFO text_generation_launcher: CLI SHARDED = False DTYPE = bfloat16

2024-03-04T19:40:17.995720Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-03-04T19:40:21.499561Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

Loading checkpoint shards: 100%|██████████| 2/2 [00:03<00:00,  1.75s/it]
Traceback (most recent call last):

  File "/usr/local/bin/text-generation-server", line 8, in <module>
    sys.exit(app())

  File "/usr/local/lib/python3.10/dist-packages/text_generation_server/cli.py", line 120, in serve
    server.serve(model_id, revision, dtype, uds_path, sharded)

  File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 191, in serve
    asyncio.run(serve_inner(model_id, revision, dtype, sharded))

  File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)

  File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()

  File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 152, in serve_inner
    model = get_model(model_id, revision=revision, dtype=data_type)

  File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/__init__.py", line 33, in get_model
    return CausalLM(model_id, revision, dtype)

  File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 625, in __init__
    model = model.eval().to(device)

  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2595, in to
    return super().to(*args, **kwargs)

  File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/core/weight_sharing.py", line 173, in wrapped_to
    result = self.original_to(*args, **kwargs)

  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1163, in to
    return self._apply(convert)

  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)

  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)

  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 833, in _apply
    param_applied = fn(param)

  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1161, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)

  File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/core/weight_sharing.py", line 53, in __torch_function__
    return super().__torch_function__(func, types, new_args, kwargs)

RuntimeError: synStatus=20 [Device already acquired] Device acquire failed.
 rank=0
2024-03-04T19:40:21.596390Z ERROR text_generation_launcher: Shard 0 failed to start
2024-03-04T19:40:21.596421Z  INFO text_generation_launcher: Shutting down shards
Error: ShardCannotStart

I see this: RuntimeError: synStatus=20 [Device already acquired] Device acquire failed.

Can you try running a simple sample network like this:
https://github.com/HabanaAI/Model-References/tree/master/PyTorch/examples/computer_vision/hello_world
and check if you are able to get HPUs and run on it

hello world gets different error. should be different root cause. docker encapsules all things, including libraries, right?

./habanalabs-installer.sh install --type pytorch --venv --verbose
ok

Validate Habanalabs PyTorch installation
================================================================================
============================= HABANA PT BRIDGE CONFIGURATION ===========================
 PT_HPU_LAZY_MODE = 1
 PT_RECIPE_CACHE_PATH =
 PT_CACHE_FOLDER_DELETE = 0
 PT_HPU_RECIPE_CACHE_CONFIG =
 PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
 PT_HPU_LAZY_ACC_PAR_MODE = 1
 PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
---------------------------: System Configuration :---------------------------
Num CPU Cores : 128
CPU RAM       : 1056454200 KB
------------------------------------------------------------------------------
Loading Habana modules from /opt/habana-pyt/lib/python3.10/site-packages/habana_frameworks/torch/lib
Habanalabs PyTorch test was completed successfully

(habana-pyt) root@node1:~/dev/Model-References/PyTorch/examples/computer_vision/hello_world# PT_HPU_LAZY_MODE=1 python mnist.py --batch-size=64 --epochs=1 --lr=1.0 --gamma=0.7 --hpu                Not using distributed mode
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ../data/MNIST/raw/train-images-idx3-ubyte.gz
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9912422/9912422 [00:00<00:00, 18100989.93it/s]
Extracting ../data/MNIST/raw/train-images-idx3-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ../data/MNIST/raw/train-labels-idx1-ubyte.gz
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28881/28881 [00:00<00:00, 130674966.37it/s]
Extracting ../data/MNIST/raw/train-labels-idx1-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw/t10k-images-idx3-ubyte.gz
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1648877/1648877 [00:00<00:00, 5279117.01it/s]
Extracting ../data/MNIST/raw/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4542/4542 [00:00<00:00, 22412386.79it/s]
Extracting ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw

============================= HABANA PT BRIDGE CONFIGURATION ===========================
 PT_HPU_LAZY_MODE = 1
 PT_RECIPE_CACHE_PATH =
 PT_CACHE_FOLDER_DELETE = 0
 PT_HPU_RECIPE_CACHE_CONFIG =
 PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
 PT_HPU_LAZY_ACC_PAR_MODE = 1
 PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
---------------------------: System Configuration :---------------------------
Num CPU Cores : 128
CPU RAM       : 1056454200 KB
------------------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/dev/Model-References/PyTorch/examples/computer_vision/hello_world/mnist.py", line 264, in <module>
    main()
  File "/root/dev/Model-References/PyTorch/examples/computer_vision/hello_world/mnist.py", line 253, in main
    train(args, model, device, train_loader,
  File "/root/dev/Model-References/PyTorch/examples/computer_vision/hello_world/mnist.py", line 76, in train
    100. * batch_idx / len(train_loader), loss.item()))
RuntimeError: synNodeCreateWithId failed for node: spatial_convolution with synStatus 26 [Generice failure]. .

mpirun -n 8 --bind-to core --map-by socket:PE=6 --rank-by core --report-bindings --allow-run-as-root -x PT_HPU_LAZY_MODE=1 python mnist.py --batch-size=64 --epochs=1 --lr=1.0 --gamma=0.7 --hpu

failed

PT_HPU_LAZY_MODE=1 python mnist.py --batch-size=64 --epochs=1 --lr=1.0 --gamma=0.7 --hpu --autocast
was running ok

8 HPU, 1 server in BF16 lazy mode:
mpirun -n 8 --bind-to core --map-by socket:PE=6 --rank-by core --report-bindings --allow-run-as-root -x PT_HPU_LAZY_MODE=1 python mnist.py --batch-size=64 --epochs=1 --lr=1.0 --gamma=0.7 --hpu --autocast

| distributed init (rank 0): env://
| distributed init (rank 4): env://
| distributed init (rank 6): env://
| distributed init (rank 7): env://
| distributed init (rank 5): env://
| distributed init (rank 2): env://
| distributed init (rank 3): env://
| distributed init (rank 1): env://
============================= HABANA PT BRIDGE CONFIGURATION ===========================
 PT_HPU_LAZY_MODE = 1
 PT_RECIPE_CACHE_PATH =
 PT_CACHE_FOLDER_DELETE = 0
 PT_HPU_RECIPE_CACHE_CONFIG =
 PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
 PT_HPU_LAZY_ACC_PAR_MODE = 1
 PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
---------------------------: System Configuration :---------------------------
Num CPU Cores : 128
CPU RAM       : 1056454200 KB
------------------------------------------------------------------------------
Train Epoch: 1 [0/7500.0 (0%)]  Loss: 2.328125
Train Epoch: 1 [640/7500.0 (9%)]        Loss: 1.132812
Train Epoch: 1 [1280/7500.0 (17%)]      Loss: 0.367188
Train Epoch: 1 [1920/7500.0 (26%)]      Loss: 0.376953
Train Epoch: 1 [2560/7500.0 (34%)]      Loss: 0.261719
Train Epoch: 1 [3200/7500.0 (43%)]      Loss: 0.154297
Train Epoch: 1 [3840/7500.0 (51%)]      Loss: 0.167969
Train Epoch: 1 [4480/7500.0 (60%)]      Loss: 0.088867
Train Epoch: 1 [5120/7500.0 (68%)]      Loss: 0.244141
Train Epoch: 1 [5760/7500.0 (77%)]      Loss: 0.101074
Train Epoch: 1 [6400/7500.0 (85%)]      Loss: 0.044922
Train Epoch: 1 [7040/7500.0 (94%)]      Loss: 0.062012

Total test set: 10000, number of workers: 8
* Average Acc 97.564 Average loss 0.073

was running ok

is it tested with specific tag or release? maybe the latest source I used to build docker has some issues since you guys still working on it

https://github.com/huggingface/tgi-gaudi/tags

If you try right now on the moving branch v1.2-release, it might not be compatible with docker 1.14. But you can try.

We will have a tag on the branch v1.2-release for docker 1.14 soon, so you can try with that

@jingkang99 , also just want to make sure you have the same driver (check hl-smi) and docker release. The latest is 1.14

@jingkang99 is this issue still exists? Can we close this?