NVIDIA/FasterTransformer

Using faster transformers to infer the bloom model, the accuracy rate is 0

hurun opened this issue · 2 comments

hurun commented

Branch/Tag/Commit

main

Docker Image Version

nvcr.io/nvidia/pytorch:22.09-py3

GPU name

V100-32G

CUDA Driver

11.0

Reproduced Steps

steps 1: pull images with docker and start the container

docker pull nvcr.io/nvidia/pytorch:22.09-py3

# start container
sudo docker run -dti --name faster_transformer \
--restart=always --gpus all --network=host \
--shm-size 5g \
-v /workspace/code:/workspace/code \
-v /workspace/data:/workspace/data \
-v /workspace/model:/workspace/model \
-v /workspace/output:/workspace/output \
-w /workspace \
nvcr.io/nvidia/pytorch:22.09-py3 bash

# exec in container
docker exec -it bloom_faster_transformer bash

steps 2: get faster transfomers project from git, and build project

cd code
git clone https://github.com/NVIDIA/FasterTransformer.git
git submodule init && git submodule update
mkdir -p build && cd build
cmake -DSM=70 -DCMAKE_BUILD_TYPE=Release -DBUILD_PYT=ON -DBUILD_MULTI_GPU=ON ..
make -j12

steps 3: install python packages

cd /workspace/code/FasterTransformer
pip install -r examples/pytorch/gpt/requirement.txt

steps 4: get model and data, use bloom-560m and Lambada datasets

cd /workspace/model
wget -c https://github.com/cybertronai/bflm/raw/master/lambada_test.jsonl
git lfs clone https://huggingface.co/bigscience/bloomz-560m

cd /workspace/data
wget -c https://github.com/cybertronai/bflm/raw/master/lambada_test.jsonl

steps 5: conver model with huggingface_bloom_convert.py

cd /workspace/code/FasterTransformer

python examples/pytorch/gpt/utils/huggingface_bloom_convert.py \
    --input-dir /workspace/model/bloomz-560m \
    --output-dir /workspace/model/bloomz-560m-convert \
    --data-type fp16 \
    -tp 1 -v

steps 6: test torch model and faster transformers model by step 5 convert tool

```bash
# Run HF benchmark
CUDA_VISIBLE_DEVICES=3 python examples/pytorch/gpt/bloom_lambada.py \
    --tokenizer-path /workspace/model/bloomz-560m \
    --dataset-path /workspace/data/lambada_test.jsonl \
    --lib-path bulid/lib/libth_transformer.so \
    --test-hf \
    --show-progress

# Run FT benchmark
CUDA_VISIBLE_DEVICES=3 python examples/pytorch/gpt/bloom_lambada.py \
    --checkpoint-path /workspace/model/bloomz-560m-convert/1-gpu \
    --tokenizer-path /workspace/model/bloomz-560m \
    --dataset-path /workspace/data/lambada_test.jsonl \
    --lib-path build/lib/libth_transformer.so \
    --show-progress

steps 7: show result below

HF benchmark result, Accuracy: 39.6274%

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 645/645 [12:57<00:00, 1.21s/it]
Accuracy: 39.6274% (2042/5153) (elapsed time: 771.5046 sec)

FT benchmark result,

[FT][INFO] Device Tesla V100-SXM2-32GB, Accuracy: 0.0000%
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5153/5153 [01:57<00:00, 43.92it/s]
Accuracy: 0.0000% (0/5153) (elapsed time: 109.4225 sec)

Tasks

No tasks being tracked yet.
hurun commented

I tried to find the reason why the acc of the ft model is 0. From the output results of the hf and ft models, I found that the ft model always predicted that the target result id was 90610.
The other input parameters are the same, such as input_ids and infer_params

I also tried some other experiments, such as running the same steps on other Nvidia graphics cards gtx2060, and the results are correct.

{
"model_answer": "-\u00e0-vis",
 "output_ids": [
    90610
  ],
"metrics": {
  "acc": 0.0
}
}

FasterTransformer development has transitioned to TensorRT-LLM. Please try that. TensorRT-LLM has supported bloom officially.