bug: tensorRT - Switching between model is causing error satisfyProfile Runtime dimension does not satisfy any optimization profile

Question

bug: tensorRT - Switching between model is causing error satisfyProfile Runtime dimension does not satisfy any optimization profile

Van-QA opened this issue 9 months ago · 2 comments

Describe the bug
Switching between TensorRT model is causing error satisfyProfile Runtime dimension does not satisfy any optimization profile

Steps to reproduce

Attempt to switch back and forth between multiple model.
Observe that the transition is not smooth and the application becomes unresponsive.
Repeat the process.
Something amiss in the UI, app.log error
Users are forced to close the application and reopen it to resolve the issue.

Expected behavior
Switching between TensorRT models in Jan should provide a seamless transition experience without causing the application to freeze or become unresponsive. Users should be able to switch between models smoothly without encountering any disruptions or glitches.

Environment details

Operating System: Windows 11
Jan Version: Jan v0.4.8-325

app.log

[1710424192] [D:\a\nitro\nitro\controllers\llamaCPP.h: 1585][llama_server_context::update_slots] slot 0 released (1638 tokens in cache)

2024-03-14T13:49:52.993Z [NITRO]::Debug: 20240314 13:49:52.991000 UTC 26240 INFO Wait for task to be released:6 - llamaCPP.cc:405

20240314 13:49:52.991000 UTC 43012 DEBUG [makeHeaderString] send stream with transfer-encoding chunked - HttpResponseImpl.cc:535

[1710424192] [D:\a\nitro\nitro\controllers\llamaCPP.h: 882][llama_server_context::launch_slot_with_data] slot 0 is processing [task id: 6]

2024-03-14T13:49:53.000Z [NITRO]::Debug: [1710424192] [D:\a\nitro\nitro\controllers\llamaCPP.h: 1722][llama_server_context::update_slots] slot 0 : kv cache rm - [0, end)

2024-03-14T13:50:10.996Z [NITRO]::Debug: [1710424210] [D:\a\nitro\nitro\controllers\llamaCPP.h: 475][llama_client_slot::print_timings]

[1710424210] [D:\a\nitro\nitro\controllers\llamaCPP.h: 480][llama_client_slot::print_timings] print_timings: prompt eval time = 14552.11 ms / 1682 tokens ( 8.65 ms per token, 115.58 tokens per second)

[1710424210] [D:\a\nitro\nitro\controllers\llamaCPP.h: 485][llama_client_slot::print_timings] print_timings: eval time = 3452.00 ms / 94 runs ( 36.72 ms per token, 27.23 tokens per second)

[1710424210] [D:\a\nitro\nitro\controllers\llamaCPP.h: 487][llama_client_slot::print_timings] print_timings: total time = 18004.10 ms

[1710424210] [D:\a\nitro\nitro\controllers\llamaCPP.h: 1585][llama_server_context::update_slots] slot 0 released (1777 tokens in cache)

2024-03-14T13:50:23.932Z [NITRO]::Debug: Request to kill Nitro

2024-03-14T13:50:23.935Z [NITRO]::Debug: 20240314 13:50:10.993000 UTC 43012 INFO reached result stop - llamaCPP.cc:365

20240314 13:50:10.993000 UTC 43012 INFO End of result - llamaCPP.cc:338

20240314 13:50:11.068000 UTC 26240 INFO Task completed, release it - llamaCPP.cc:408

20240314 13:50:23.934000 UTC 2088 INFO Program is exitting, goodbye! - processManager.cc:8

20240314 13:50:23.934000 UTC 2088 INFO changed to false - llamaCPP.cc:680

[1710424223] [D:\a\nitro\nitro\controllers\llamaCPP.h: 1585][llama_server_context::update_slots] slot 0 released (1777 tokens in cache)

2024-03-14T13:50:24.953Z [TENSORRT_LLM_NITRO]::Debug:Request to kill engine

2024-03-14T13:50:27.489Z [NITRO]::Debug: Nitro process is terminated

2024-03-14T13:50:27.490Z [TENSORRT_LLM_NITRO]::Debug:Engine process is terminated

2024-03-14T13:50:27.490Z [TENSORRT_LLM_NITRO]::Debug:Spawning engine subprocess...

2024-03-14T13:50:27.490Z [TENSORRT_LLM_NITRO]::Debug:Spawn nitro at path: C:\Users\dan\jan\extensions\@janhq\tensorrt-llm-extension\dist\bin\nitro.exe, and args: 1,127.0.0.1,3928

2024-03-14T13:50:27.495Z [NITRO]::Debug: Nitro exited with code: 3221226505

2024-03-14T13:50:27.682Z [TENSORRT_LLM_NITRO]::Debug:�[0m

�[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m �[94m �[94m �[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m �[94m �[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m �[0m

�[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m �[94m �[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[0m

�[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m �[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m �[94m �[94m �[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m �[94m �[94m �[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m �[94m �[94m �[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[0m

�[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m �[94m �[94m �[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m �[94m �[94m �[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m �[94m �[94m �[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[0m

�[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m �[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m �[94m �[94m �[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m �[94m �[94m �[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m �[94m �[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[0m

�[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m �[94m �[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m �[94m �[94m �[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m �[94m �[94m �[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m �[94m �[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m �[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m��[94m �[0m

�[94m �[0m

�[94m �[94m �[0m�[1;32m �[0m

�[1;32m-�[1;32m-�[1;32m-�[1;32m-�[1;32m-�[1;32m-�[1;32m-�[1;32m-�[1;32m-�[1;32m-�[1;32m-�[1;32m-�[1;32m-�[1;32m-�[1;32m-�[1;32m-�[1;32m-�[1;32m-�[1;32m-�[1;32m-�[1;32m-�[1;32m-�[1;32m-�[1;32m-�[0m

�[1;32m �[1;32m �[1;32m �[1;32m �[1;32m_�[1;32m_�[1;32m_�[1;32m_�[1;32m �[1;32m �[1;32m_�[1;32m_�[1;32m_�[1;32m_�[1;32m_�[1;32m_�[1;32m �[1;32m_�[1;32m_�[1;32m �[1;32m �[1;32m_�[1;32m_�[1;

2024-03-14T13:50:27.808Z [TENSORRT_LLM_NITRO]::Debug:Engine is ready

2024-03-14T13:50:27.809Z [TENSORRT_LLM_NITRO]::Debug:Loading model with params {"engine_path":"C:\\Users\\dan\\jan\\models\\llamacorn-1.1b-chat-fp16","ctx_len":2048}

2024-03-14T13:50:28.731Z [TENSORRT_LLM_NITRO]::Debug:32m �[1;32m �[1;32m �[1;32m_�[1;32m_�[1;32m_�[1;32m_�[1;32m_�[1;32m_�[1;32m_�[1;32m_�[1;32m �[1;32m �[1;32m �[1;32m_�[1;32m_�[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[0m

�[1;32m_�[1;32m_�[1;32m_�[1;32m/�[1;32m �[1;32m_�[1;32m_�[1;32m �[1;32m\�[1;32m_�[1;32m_�[1;32m �[1;32m �[1;32m_�[1;32m_�[1;32m/�[1;32m_�[1;32m �[1;32m �[1;32m|�[1;32m/�[1;32m �[1;32m/�[1;32m �[1;32m �[1;32m �[1;32m �[1;32m_�[1;32m_�[1;32m �[1;32m �[1;32m_�[1;32m_�[1;32m �[1;32m\�[1;32m_�[1;32m_�[1;32m �[1;32m �[1;32m|�[1;32m �[1;32m/�[1;32m �[1;32m/�[0m

�[1;32m_�[1;32m_�[1;32m/�[1;32m �[1;32m/�[1;32m_�[1;32m/�[1;32m �[1;32m/�[1;32m_�[1;32m/�[1;32m �[1;32m/�[1;32m �[1;32m �[1;32m_�[1;32m\�[1;32m �[1;32m �[1;32m �[1;32m �[1;32m/�[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m_�[1;32m/�[1;32m �[1;32m/�[1;32m �[1;32m/�[1;32m �[1;32m/�[1;32m_�[1;32m �[1;32m �[1;32m �[1;32m|�[1;32m/�[1;32m �[1;32m/�[1;32m �[0m

�[1;32m_�[1;32m/�[1;32m �[1;32m_�[1;32m,�[1;32m �[1;32m_�[1;32m/�[1;32m_�[1;32m/�[1;32m �[1;32m/�[1;32m �[1;32m �[1;32m �[1;32m_�[1;32m/�[1;32m �[1;32m �[1;32m �[1;32m|�[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m/�[1;32m �[1;32m/�[1;32m_�[1;32m/�[1;32m �[1;32m/�[1;32m_�[1;32m �[1;32m �[1;32m/�[1;32m|�[1;32m �[1;32m �[1;32m/�[1;32m �[1;32m �[0m

�[1;32m/�[1;32m_�[1;32m/�[1;32m �[1;32m|�[1;32m_�[1;32m|�[1;32m �[1;32m/�[1;32m_�[1;32m/�[1;32m �[1;32m �[1;32m �[1;32m �[1;32m/�[1;32m_�[1;32m/�[1;32m|�[1;32m_�[1;32m|�[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m\�[1;32m_�[1;32m_�[1;32m_�[1;32m_�[1;32m/�[1;32m �[1;32m/�[1;32m_�[1;32m/�[1;32m �[1;32m|�[1;32m_�[1;32m/�[1;32m �[1;32m �[1;32m �[0m

�[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[1;32m �[0m

�[0m20240314 13:50:27.681000 UTC 10304 INFO Nitro version: undefined - main.cc:57

20240314 13:50:27.681000 UTC 10304 INFO Server started, listening at: 127.0.0.1:3928 - main.cc:59

20240314 13:50:27.681000 UTC 10304 INFO Please load your model - main.cc:60

20240314 13:50:27.681000 UTC 10304 INFO Number of thread is:1 - main.cc:68

[TensorRT-LLM][INFO] Set logger level by INFO

2024-03-14T13:50:28.746Z [TENSORRT_LLM_NITRO]::Debug:20240314 13:50:28.743000 UTC 12044 INFO Successully loaded the tokenizer - tensorrtllm.h:53

20240314 13:50:28.743000 UTC 12044 INFO Loaded tokenizer - tensorrtllm.cc:354

[TensorRT-LLM][INFO] Engine version 0.8.0 found in the config file, assuming engine(s) built by new builder API.

[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be array, but is null

[TensorRT-LLM][WARNING] Optional value for parameter lora_target_modules will not be set.

[TensorRT-LLM][WARNING] Parameter max_draft_len cannot be read from json:

[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found

[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null

[TensorRT-LLM][WARNING] Optional value for parameter quant_algo will not be set.

[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null

[TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set.

[TensorRT-LLM][INFO] Initializing MPI with thread mode 1

2024-03-14T13:50:28.753Z [TENSORRT_LLM_NITRO]::Debug:[TensorRT-LLM][INFO] MPI size: 1, rank: 0

2024-03-14T13:50:31.055Z [TENSORRT_LLM_NITRO]::Debug:20240314 13:50:28.753000 UTC 12044 INFO Engine Path : C:\Users\dan\jan\models\llamacorn-1.1b-chat-fp16\rank0.engine - tensorrtllm.cc:361

[TensorRT-LLM][INFO] Loaded engine size: 2100 MiB

2024-03-14T13:50:31.063Z [TENSORRT_LLM_NITRO]::Debug:[TensorRT-LLM][WARNING] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.

2024-03-14T13:50:31.537Z [TENSORRT_LLM_NITRO]::Debug:[TensorRT-LLM][INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 20760, GPU 3251 (MiB)

2024-03-14T13:50:31.550Z [TENSORRT_LLM_NITRO]::Debug:[TensorRT-LLM][INFO] [MemUsageChange] Init cuDNN: CPU +6, GPU +8, now: CPU 20766, GPU 3259 (MiB)

2024-03-14T13:50:31.567Z [TENSORRT_LLM_NITRO]::Debug:[TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +2098, now: CPU 0, GPU 2098 (MiB)

2024-03-14T13:50:31.574Z [TENSORRT_LLM_NITRO]::Debug:[TensorRT-LLM][INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 20813, GPU 3353 (MiB)

2024-03-14T13:50:31.576Z [TENSORRT_LLM_NITRO]::Debug:[TensorRT-LLM][INFO] [MemUsageChange] Init cuDNN: CPU +6, GPU +8, now: CPU 20819, GPU 3361 (MiB)

2024-03-14T13:50:31.688Z [TENSORRT_LLM_NITRO]::Debug:[TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 2098 (MiB)

2024-03-14T13:50:31.694Z [TENSORRT_LLM_NITRO]::Debug:[TensorRT-LLM][INFO] Allocate 255328256 bytes for k/v cache.

[TensorRT-LLM][INFO] Using 201984 tokens in paged KV cache.

2024-03-14T13:50:31.822Z [TENSORRT_LLM_NITRO]::Debug:Load model success with response {}

2024-03-14T13:50:31.916Z [TENSORRT_LLM_NITRO]::Debug:20240314 13:50:31.856000 UTC 12044 DEBUG [makeHeaderString] send stream with transfer-encoding chunked - HttpResponseImpl.cc:535

[TensorRT-LLM][ERROR] 3: [executionContext.cpp::nvinfer1::rt::ExecutionContext::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::nvinfer1::rt::ExecutionContext::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)

job aborted:

[ranks] message

[0] application aborted

aborting MPI_COMM_WORLD (comm=0x44000000), error 1, comm rank 0

Originally posted by @dan-jan in janhq/jan#2358 (comment)

Answer 1 · 2024-07-25T02:21:52.000Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."