mistralai/mistral-inference

[BUG: Could not find consolidated.00.pth or consolidated.safetensors in Mistral model path but mistralai/Mistral-Large-Instruct-2407 surely not contains it

ShadowTeamCN opened this issue · 9 comments

Python -VV

Python 3.10.12 (main, Mar 22 2024, 16:50:05) [GCC 11.4.0]

Pip Freeze

accelerate==0.32.1
aiohttp==3.9.5
aiosignal==1.3.1
annotated-types==0.7.0
anyio==4.4.0
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
asttokens==2.4.1
async-lru==2.0.4
async-timeout==4.0.3
attrs==23.2.0
Babel==2.15.0
beautifulsoup4==4.12.3
bitsandbytes==0.43.1
bleach==6.1.0
blinker==1.8.2
certifi==2024.7.4
cffi==1.16.0
charset-normalizer==3.3.2
click==8.1.7
cloudpickle==3.0.0
cmake==3.30.0
comm==0.2.2
cos-python-sdk-v5==1.9.30
coscmd==1.8.6.31
crcmod==1.7
datasets==2.20.0
DateTime==5.5
dbus-python==1.2.18
debugpy==1.8.2
decorator==5.1.1
deepspeed==0.14.4
defusedxml==0.7.1
dill==0.3.8
diskcache==5.6.3
distro==1.7.0
dnspython==2.6.1
docstring_parser==0.16
einops==0.8.0
email_validator==2.2.0
et-xmlfile==1.1.0
exceptiongroup==1.2.2
executing==2.0.1
fastapi==0.111.1
fastapi-cli==0.0.4
fastjsonschema==2.20.0
filelock==3.15.4
fire==0.6.0
flash-attn==2.6.1
Flask==3.0.3
fqdn==1.5.1
frozenlist==1.4.1
fsspec==2024.5.0
h11==0.14.0
hjson==3.1.0
httpcore==1.0.5
httptools==0.6.1
httpx==0.27.0
huggingface-hub==0.24.0
idna==3.7
interegular==0.3.3
ipykernel==6.29.5
ipython==8.26.0
ipywidgets==8.1.3
isoduration==20.11.0
itsdangerous==2.2.0
jedi==0.19.1
jieba==0.42.1
Jinja2==3.1.4
json5==0.9.25
jsonlines==4.0.0
jsonpointer==3.0.0
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
jupyter==1.0.0
jupyter-console==6.6.3
jupyter-events==0.10.0
jupyter-lsp==2.2.5
jupyter_client==8.6.2
jupyter_core==5.7.2
jupyter_server==2.14.2
jupyter_server_terminals==0.5.3
jupyterlab==4.2.3
jupyterlab_pygments==0.3.0
jupyterlab_server==2.27.3
jupyterlab_widgets==3.0.11
jupytext==1.16.3
lark==1.1.9
llvmlite==0.43.0
lm-format-enforcer==0.10.3
loguru==0.7.2
lxml==5.2.2
markdown-it-py==3.0.0
MarkupSafe==2.1.5
matplotlib-inline==0.1.7
mdit-py-plugins==0.4.1
mdurl==0.1.2
mistral_common==1.3.3
mistral_inference==1.3.1
mistune==3.0.2
mpmath==1.3.0
msgpack==1.0.8
multidict==6.0.5
multiprocess==0.70.16
nbclient==0.10.0
nbconvert==7.16.4
nbformat==5.10.4
nest-asyncio==1.6.0
networkx==3.3
ninja==1.11.1.1
notebook==7.2.1
notebook_shim==0.2.4
numba==0.60.0
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-ml-py==12.555.43
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.5.82
nvidia-nvtx-cu12==12.1.105
openai==1.35.15
opencc-python-reimplemented==0.1.7
openpyxl==3.1.5
outlines==0.0.46
overrides==7.7.0
packaging==24.1
pandarallel==1.6.5
pandas==2.2.2
pandocfilters==1.5.1
parso==0.8.4
peft==0.11.1
pexpect==4.9.0
pillow==10.4.0
platformdirs==4.2.2
prettytable==3.10.2
prometheus-fastapi-instrumentator==7.0.0
prometheus_client==0.20.0
prompt_toolkit==3.0.47
protobuf==5.27.2
psutil==6.0.0
ptyprocess==0.7.0
pure-eval==0.2.2
py-cpuinfo==9.0.0
pyairports==2.1.1
pyarrow==17.0.0
pyarrow-hotfix==0.6
pycountry==24.6.1
pycparser==2.22
pycryptodome==3.20.0
pydantic==2.6.1
pydantic_core==2.16.2
Pygments==2.18.0
PyGObject==3.42.1
pypinyin==0.51.0
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
python-json-logger==2.0.7
python-multipart==0.0.9
pytz==2024.1
PyYAML==6.0.1
pyzmq==26.0.3
qtconsole==5.5.2
QtPy==2.4.1
ray==2.32.0
referencing==0.35.1
regex==2024.5.15
requests==2.32.3
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rich==13.7.1
rpds-py==0.19.0
safetensors==0.4.3
Send2Trash==1.8.3
sentencepiece==0.2.0
shellingham==1.5.4
simple_parsing==0.1.5
six==1.16.0
sniffio==1.3.1
soupsieve==2.5
ssh-import-id==5.11
stack-data==0.6.3
starlette==0.37.2
sympy==1.13.0
tencentcloud-sdk-python==3.0.955
termcolor==2.4.0
terminado==0.18.1
tikit==1.7.9.240628
tiktoken==0.7.0
tinycss2==1.3.0
tokenizers==0.19.1
tomli==2.0.1
torch==2.3.1
torchvision==0.18.1
tornado==6.4.1
tqdm==4.66.4
traitlets==5.14.3
transformers==4.42.4
triton==2.3.1
typer==0.12.3
types-python-dateutil==2.9.0.20240316
typing_extensions==4.12.2
tzdata==2024.1
uri-template==1.3.0
urllib3==2.2.2
uvicorn==0.30.1
uvloop==0.19.0
vllm==0.5.2
vllm-flash-attn==2.5.9.post1
watchfiles==0.22.0
wcwidth==0.2.13
webcolors==24.6.0
webencodings==0.5.1
websocket-client==1.8.0
websockets==12.0
Werkzeug==3.0.3
widgetsnbextension==4.0.11
xformers==0.0.27
XlsxWriter==3.2.0
xmltodict==0.13.0
xxhash==3.4.1
yarl==1.9.4
zope.interface==6.4.post2

Reproduction Steps

from mistral_inference.transformer import Transformer
from mistral_inference.generate import generate

from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest

mistral_models_path='/path/to/Mistral-Large-Instruct-2407/'
tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tokenizer.model.v3")
model = Transformer.from_folder(mistral_models_path)

Expected Behavior

load model successfully

Additional Context


AssertionError Traceback (most recent call last)
Cell In[3], line 10
8 mistral_models_path='/home/tione/notebook/PretrainModelStore/Mistral-Large-Instruct-2407/'
9 tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tokenizer.model.v3")
---> 10 model = Transformer.from_folder(mistral_models_path)

File /usr/local/lib/python3.10/dist-packages/mistral_inference/transformer.py:353, in Transformer.from_folder(folder, max_batch_size, num_pipeline_ranks, device, dtype)
350 pt_model_file = Path(folder) / "consolidated.00.pth"
351 safetensors_model_file = Path(folder) / "consolidated.safetensors"
--> 353 assert (
354 pt_model_file.exists() or safetensors_model_file.exists()
355 ), f"Make sure either {pt_model_file} or {safetensors_model_file} exists"
356 assert not (
357 pt_model_file.exists() and safetensors_model_file.exists()
358 ), f"Both {pt_model_file} and {safetensors_model_file} cannot exist"
360 if pt_model_file.exists():

AssertionError: Make sure either /home/tione/notebook/PretrainModelStore/Mistral-Large-Instruct-2407/consolidated.00.pth or /home/tione/notebook/PretrainModelStore/Mistral-Large-Instruct-2407/consolidated.safetensors exists

Suggested Solutions

I think the model file check script check the wrong file name:

        pt_model_file = Path(folder) / "consolidated.00.pth"
        safetensors_model_file = Path(folder) / "consolidated.safetensors"

        assert (
            pt_model_file.exists() or safetensors_model_file.exists()
        ), f"Make sure either {pt_model_file} or {safetensors_model_file} exists"
        assert not (
            pt_model_file.exists() and safetensors_model_file.exists()
        ), f"Both {pt_model_file} and {safetensors_model_file} cannot exist"

I have encountered the same problem.

I have encountered the same problem.

You can directly using vllm for inference, I find it compatibale with Mistral-Large-2

I have encountered the same problem.

You can directly using vllm for inference, I find it compatibale with Mistral-Large-2

can you tell me which version you have installed?

I have encountered the same problem.

Thanks. Can you give detailed instructions on how you use vllm?

I have encountered the same problem.

You can directly using vllm for inference, I find it compatibale with Mistral-Large-2

I have encountered the same problem.

You can directly using vllm for inference, I find it compatibale with Mistral-Large-2

use api?

@liuanping @shangh1 @endNone
all my package version are listed above, as for vllm, that is
vllm==0.5.2
inference code is quite simple , I'm using 4*H100 for mistral-large-2

from vllm import LLM,SamplingParams
llm = LLM(path,tensor_parallel_size=4,max_seq_len_to_capture=8192*2,gpu_memory_utilization=0.95)
tokenizer = llm.get_tokenizer()
prompt=f'''Your prompt here'''
messages = [{"role": "user", "content": prompt}]
input_ids = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
sampling_params=SamplingParams(temperature=0.4,max_tokens=8192,stop=[tokenizer.eos_token])
output=llm.generate(input_ids,sampling_params=sampling_params,use_tqdm=False)        
print(output[0].outputs[0].text)

@liuanping @shangh1 @fuegoio all my package version are listed above, as for vllm, that is vllm==0.5.2 inference code is quite simple , I'm using 4*H100 for mistral-large-2

from vllm import LLM,SamplingParams
llm = LLM(path,tensor_parallel_size=4,max_seq_len_to_capture=8192*2,gpu_memory_utilization=0.95)
tokenizer = llm.get_tokenizer()
prompt=f'''Your prompt here'''
messages = [{"role": "user", "content": prompt}]
input_ids = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
sampling_params=SamplingParams(temperature=0.4,max_tokens=8192,stop=[tokenizer.eos_token])
output=llm.generate(input_ids,sampling_params=sampling_params,use_tqdm=False)        
print(output[0].outputs[0].text)

Thank you!Nice work!

Actually, I'm keen on trying out the official mistral_inference for testing purposes. Could you please tell me when the official team plans to fix this bug?