sophgo/LLM-TPU

Support for Llama 3.1 model

Opened this issue · 3 comments

Are there instructions specific to creating a bmodel from onnx for Llama 3.1 (not lllam3)

Running this is erroring out.
python export_onnx.py --model_path ../../../../Meta-Llama-3.1-8B-Instruct/ --seq_length 1024

Convert block & block_cache
0%| | 0/32 [00:00<?, ?it/s]The attention layers in this model are transitioning from computing the RoPE embeddings internally through position_ids (2D tensor with the indexes of the tokens), to using externally computed position_embeddings (Tuple of tensors, containing cos and sin). In v4.45 position_ids will be removed and position_embeddings will be mandatory.

we are supporting llama 3.1, please be patient, thanks~

LLM-TPU/models/Llama3_1/compile/export_onnx.py does not exist.. (according to documentation it should)

pip install --upgrade transformers to version 4.44.0

Copying the one from Llama3 and running it ..is getting error
The attention layers in this model are transitioning from computing the RoPE embeddings internally through position_ids (2D tensor with the indexes of the tokens), to using externally computed position_embeddings (Tuple of tensors, containing cos and sin). In v4.45 position_ids will be removed and position_embeddings will be mandatory.
AttributeError: 'tuple' object has no attribute 'update'

In interim can you make the bmodel available
python3 -m dfss --url=open@sophgo.com:/ext_model_information/LLM/LLM-TPU/llama3.1-8b_int4_1dev_seq512.bmodel

Currently it says file not found.

python3 -m dfss --url=open@sophgo.com:/ext_model_information/LLM/LLM-TPU/llama3.1-8b_int8_1dev_seq512.bmodel

python3 -m dfss --url=open@sophgo.com:/ext_model_information/LLM/LLM-TPU/llama3.1-8b_int8_1dev_seq1024.bmodel

python3 -m dfss --url=open@sophgo.com:/ext_model_information/LLM/LLM-TPU/llama3.1-8b_int8_1dev_seq2048.bmodel

python3 -m dfss --url=open@sophgo.com:/ext_model_information/LLM/LLM-TPU/llama3.1-8b_int8_1dev_seq4096.bmodel

is available