Whisper with DirectML: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running WhisperBeamSearch node

Question

Whisper with DirectML: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running WhisperBeamSearch node

WA225 opened this issue 6 months ago · 0 comments

Describe the bug
The execution fails when I am trying to run Whisper on an AMD Radeon 780M Graphics using DirectML EP with the following error: onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running WhisperBeamSearch node. Name:'BeamSearch_node' Status Message: Non-zero status code returned while running Conv node. Name:'/whisper_encoder/encoder/conv1/Conv' Status Message: C:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2557)\onnxruntime_pybind11_state.pyd!00007FFC4E4A2689: (caller: 00007FFC4EBF5261) Exception(3) tid(1305c) 80070057 The parameter is incorrect.

To Reproduce
I am running the following commands in this order:
olive run --config whisper_dml_fp32.json --setup
python -m pip install onnxruntime-extensions>=0.9.0
olive run --config whisper_dml_fp32.json 2> log.txt --tempdir .

Olive config
{
"input_model": {
"type": "PyTorchModel",
"config": {
"model_script": "code/user_script.py",
"script_dir": "code",
"hf_config": {
"model_class": "WhisperForConditionalGeneration",
"model_name": "openai/whisper-tiny.en",
"components": [
{
"name": "encoder_decoder_init",
"io_config": "get_encdec_io_config",
"component_func": "get_encoder_decoder_init",
"dummy_inputs_func": "encoder_decoder_init_dummy_inputs"
},
{
"name": "decoder",
"io_config": "get_dec_io_config",
"component_func": "get_decoder",
"dummy_inputs_func": "decoder_dummy_inputs"
}
],
"from_pretrained_args": {
"attn_implementation": "eager"
}
}
}
},
"systems": {
"local_system": {
"type": "LocalSystem",
"config": {
"accelerators": [
{
"device": "gpu",
"execution_providers": [
"DmlExecutionProvider"
]
}
]
}
}
},
"evaluators": {
"common_evaluator": {
"metrics": [
{
"name": "latency",
"type": "latency",
"sub_types": [
{
"name": "avg"
}
],
"user_config": {
"user_script": "code/user_script.py",
"script_dir": "code",
"data_dir": "data",
"dataloader_func": "whisper_dataloader",
"func_kwargs": {
"dataloader_func": {
"model_name": "openai/whisper-tiny.en",
"use_audio_decoder": true
}
},
"batch_size": 1
}
}
]
}
},
"passes": {
"conversion": {
"type": "OnnxConversion",
"config": {
"target_opset": 17,
"save_as_external_data": true,
"all_tensors_to_one_file": true
}
},
"transformers_optimization": {
"type": "OrtTransformersOptimization",
"config": {
"save_as_external_data": true,
"all_tensors_to_one_file": true,
"opt_level": 0,
"optimization_options": {
"enable_gelu": true,
"enable_layer_norm": true,
"enable_attention": true,
"use_multi_head_attention": true,
"enable_skip_layer_norm": false,
"enable_embed_layer_norm": false,
"enable_bias_skip_layer_norm": false,
"enable_bias_gelu": false,
"enable_gelu_approximation": false,
"enable_qordered_matmul": false,
"enable_shape_inference": true,
"enable_gemm_fast_gelu": false,
"enable_nhwc_conv": false,
"enable_group_norm": false,
"enable_bias_splitgelu": false,
"enable_packed_qkv": true,
"enable_packed_kv": true,
"enable_bias_add": false,
"enable_rotary_embeddings": true
},
"use_gpu": true
}
},
"insert_beam_search": {
"type": "InsertBeamSearch",
"config": {
"use_forced_decoder_ids": false,
"use_logits_processor": false,
"use_gpu": true
}
},
"prepost": {
"type": "AppendPrePostProcessingOps",
"config": {
"tool_command": "whisper",
"tool_command_args": {
"model_name": "openai/whisper-tiny.en",
"use_audio_decoder": true
},
"target_opset": 17
}
}
},
"engine": {
"search_strategy": {
"execution_order": "joint",
"search_algorithm": "exhaustive"
},
"ort_log_severity_level" : 0,
"log_severity_level": 0,
"host": "local_system",
"target": "local_system",
"evaluator": "common_evaluator",
"evaluate_input_model": false,
"clean_cache": false,
"cache_dir": "cache",
"output_dir": "models",
"output_name": "whisper_dml_fp32"
}
}

Olive logs
[2024-07-02 10:46:11,124] [INFO] [run.py:138:run_engine] Running workflow default_workflow
[2024-07-02 10:46:11,132] [INFO] [engine.py:986:save_olive_config] Saved Olive config to cache\default_workflow\olive_config.json
[2024-07-02 10:46:11,132] [DEBUG] [run.py:179:run_engine] Registering pass OnnxConversion
[2024-07-02 10:46:11,136] [DEBUG] [run.py:179:run_engine] Registering pass OrtTransformersOptimization
[2024-07-02 10:46:11,137] [DEBUG] [run.py:179:run_engine] Registering pass InsertBeamSearch
[2024-07-02 10:46:11,138] [DEBUG] [run.py:179:run_engine] Registering pass AppendPrePostProcessingOps
[2024-07-02 10:46:11,146] [DEBUG] [accelerator_creator.py:130:_fill_accelerators] The accelerator device and execution providers are specified, skipping deduce.
[2024-07-02 10:46:11,146] [DEBUG] [accelerator_creator.py:169:_check_execution_providers] Supported execution providers for device gpu: ['DmlExecutionProvider', 'CPUExecutionProvider']
[2024-07-02 10:46:11,147] [DEBUG] [accelerator_creator.py:199:create_accelerators] Initial accelerators and execution providers: {'gpu': ['DmlExecutionProvider']}
[2024-07-02 10:46:11,147] [INFO] [accelerator_creator.py:224:create_accelerators] Running workflow on accelerator specs: gpu-dml
[2024-07-02 10:46:11,147] [DEBUG] [run.py:235:run_engine] Pass OnnxConversion already registered
[2024-07-02 10:46:11,147] [DEBUG] [run.py:235:run_engine] Pass OrtTransformersOptimization already registered
[2024-07-02 10:46:11,147] [DEBUG] [run.py:235:run_engine] Pass InsertBeamSearch already registered
[2024-07-02 10:46:11,148] [DEBUG] [run.py:235:run_engine] Pass AppendPrePostProcessingOps already registered
[2024-07-02 10:46:11,148] [INFO] [engine.py:109:initialize] Using cache directory: cache\default_workflow
[2024-07-02 10:46:11,161] [INFO] [engine.py:265:run] Running Olive on accelerator: gpu-dml
[2024-07-02 10:46:11,161] [INFO] [engine.py:1085:_create_system] Creating target system ...
[2024-07-02 10:46:11,161] [DEBUG] [engine.py:1081:create_system] create native OliveSystem SystemType.Local
[2024-07-02 10:46:11,162] [INFO] [engine.py:1088:_create_system] Target system created in 0.001005 seconds
[2024-07-02 10:46:11,162] [INFO] [engine.py:1097:_create_system] Creating host system ...
[2024-07-02 10:46:11,163] [DEBUG] [engine.py:1081:create_system] create native OliveSystem SystemType.Local
[2024-07-02 10:46:11,163] [INFO] [engine.py:1100:_create_system] Host system created in 0.000999 seconds
[2024-07-02 10:46:11,202] [DEBUG] [engine.py:711:_cache_model] Cached model df880b77 to cache\default_workflow\models\df880b77.json
[2024-07-02 10:46:11,203] [DEBUG] [engine.py:348:run_accelerator] Running Olive in search mode ...
[2024-07-02 10:46:11,203] [DEBUG] [engine.py:623:resolve_goals] Resolving goals: {'latency': {'avg': None}}
[2024-07-02 10:46:11,203] [DEBUG] [engine.py:642:resolve_goals] No baseline got as no goal is provided the the goal is threshold
[2024-07-02 10:46:11,204] [DEBUG] [engine.py:531:run_search] Step 1 with search point {'conversion': {}, 'transformers_optimization': {'only_onnxruntime': True}, 'insert_beam_search': {}, 'prepost': {}} ...
[2024-07-02 10:46:11,204] [INFO] [engine.py:867:_run_pass] Running pass conversion:OnnxConversion
[2024-07-02 10:46:11,207] [DEBUG] [resource_path.py:156:create_resource_path] Resource path code/user_script.py is inferred to be of type file.
[2024-07-02 10:46:11,209] [DEBUG] [resource_path.py:156:create_resource_path] Resource path code is inferred to be of type folder.
[2024-07-02 10:46:11,211] [DEBUG] [resource_path.py:156:create_resource_path] Resource path code is inferred to be of type folder.
[2024-07-02 10:46:11,212] [DEBUG] [resource_path.py:156:create_resource_path] Resource path code/user_script.py is inferred to be of type file.
[2024-07-02 10:46:11,449] [DEBUG] [resource_path.py:156:create_resource_path] Resource path C:\Olive-main\examples\whisper\code is inferred to be of type folder.
[2024-07-02 10:46:11,451] [DEBUG] [resource_path.py:156:create_resource_path] Resource path C:\Olive-main\examples\whisper\code\user_script.py is inferred to be of type file.
[2024-07-02 10:46:11,470] [INFO] [hf_config.py:112:load_hf_model] Loading Huggingface model from openai/whisper-tiny.en
[2024-07-02 10:46:12,330] [DEBUG] [resource_path.py:156:create_resource_path] Resource path C:\Olive-main\examples\whisper\code is inferred to be of type folder.
[2024-07-02 10:46:12,332] [DEBUG] [resource_path.py:156:create_resource_path] Resource path C:\Olive-main\examples\whisper\code\user_script.py is inferred to be of type file.
[2024-07-02 10:46:12,460] [DEBUG] [resource_path.py:156:create_resource_path] Resource path C:\Olive-main\examples\whisper\code is inferred to be of type folder.
[2024-07-02 10:46:12,462] [DEBUG] [resource_path.py:156:create_resource_path] Resource path C:\Olive-main\examples\whisper\code\user_script.py is inferred to be of type file.
[2024-07-02 10:46:12,466] [DEBUG] [dummy_inputs.py:45:get_dummy_inputs] Using dummy_inputs_func to get dummy inputs
[2024-07-02 10:46:12,583] [DEBUG] [pytorch.py:277:get_user_io_config] Calling get_encdec_io_config to get io_config
[2024-07-02 10:46:13,161] [DEBUG] [conversion.py:234:_export_pytorch_model] Converting model on device cpu with dtype None.
[2024-07-02 10:46:16,354] [DEBUG] [resource_path.py:156:create_resource_path] Resource path C:\Olive-main\examples\whisper\code is inferred to be of type folder.
[2024-07-02 10:46:16,355] [DEBUG] [resource_path.py:156:create_resource_path] Resource path C:\Olive-main\examples\whisper\code\user_script.py is inferred to be of type file.
[2024-07-02 10:46:16,471] [DEBUG] [resource_path.py:156:create_resource_path] Resource path C:\Olive-main\examples\whisper\code is inferred to be of type folder.
[2024-07-02 10:46:16,473] [DEBUG] [resource_path.py:156:create_resource_path] Resource path C:\Olive-main\examples\whisper\code\user_script.py is inferred to be of type file.
[2024-07-02 10:46:16,476] [DEBUG] [dummy_inputs.py:45:get_dummy_inputs] Using dummy_inputs_func to get dummy inputs
[2024-07-02 10:46:16,680] [DEBUG] [pytorch.py:277:get_user_io_config] Calling get_dec_io_config to get io_config
[2024-07-02 10:46:16,803] [DEBUG] [conversion.py:234:_export_pytorch_model] Converting model on device cpu with dtype None.
[2024-07-02 10:46:18,467] [INFO] [engine.py:954:_run_pass] Pass conversion:OnnxConversion finished in 7.256834 seconds
[2024-07-02 10:46:18,485] [DEBUG] [engine.py:711:_cache_model] Cached model 0_OnnxConversion-df880b77-673bf9e9 to cache\default_workflow\models\0_OnnxConversion-df880b77-673bf9e9.json
[2024-07-02 10:46:18,485] [DEBUG] [engine.py:794:_cache_run] Cached run for df880b77->0_OnnxConversion-df880b77-673bf9e9 into cache\default_workflow\runs\OnnxConversion-df880b77-673bf9e9.json
[2024-07-02 10:46:18,485] [INFO] [engine.py:867:_run_pass] Running pass transformers_optimization:OrtTransformersOptimization
[2024-07-02 10:46:18,493] [INFO] [transformer_optimization.py:178:validate_search_point] Please specify a positive value for opt_level when only_onnxruntime is True
[2024-07-02 10:46:18,493] [WARNING] [engine.py:873:_run_pass] Invalid search point, prune
[2024-07-02 10:46:18,493] [DEBUG] [engine.py:834:_run_passes] Pruned for pass transformers_optimization
[2024-07-02 10:46:18,493] [WARNING] [engine.py:850:_run_passes] Skipping evaluation as model was pruned
[2024-07-02 10:46:18,494] [DEBUG] [engine.py:531:run_search] Step 2 with search point {'conversion': {}, 'transformers_optimization': {'only_onnxruntime': False}, 'insert_beam_search': {}, 'prepost': {}} ...
[2024-07-02 10:46:18,494] [INFO] [engine.py:867:_run_pass] Running pass conversion:OnnxConversion
[2024-07-02 10:46:18,495] [DEBUG] [engine.py:886:_run_pass] Loading model from cache ...
[2024-07-02 10:46:18,497] [INFO] [engine.py:901:_run_pass] Loaded model from cache: 0_OnnxConversion-df880b77-673bf9e9 from cache\default_workflow\runs
[2024-07-02 10:46:18,499] [INFO] [engine.py:867:_run_pass] Running pass transformers_optimization:OrtTransformersOptimization
[2024-07-02 10:46:18,501] [DEBUG] [resource_path.py:156:create_resource_path] Resource path C:\Olive-main\examples\whisper\cache\default_workflow\models\0_OnnxConversion-df880b77-673bf9e9\output_model\encoder_decoder_init is inferred to be of type folder.
[2024-07-02 10:46:18,504] [DEBUG] [resource_path.py:156:create_resource_path] Resource path C:\Olive-main\examples\whisper\cache\default_workflow\models\0_OnnxConversion-df880b77-673bf9e9\output_model\decoder is inferred to be of type folder.
[2024-07-02 10:46:18,561] [DEBUG] [transformer_optimization.py:253:_run_for_config] model_type is set to bart from model attributes
[2024-07-02 10:46:18,561] [DEBUG] [transformer_optimization.py:259:_run_for_config] num_heads is set to 6 from model attributes
[2024-07-02 10:46:18,561] [DEBUG] [transformer_optimization.py:265:_run_for_config] hidden_size is set to 384 from model attributes
[2024-07-02 10:46:20,740] [DEBUG] [transformer_optimization.py:253:_run_for_config] model_type is set to bart from model attributes
[2024-07-02 10:46:20,740] [DEBUG] [transformer_optimization.py:259:_run_for_config] num_heads is set to 6 from model attributes
[2024-07-02 10:46:20,740] [DEBUG] [transformer_optimization.py:265:_run_for_config] hidden_size is set to 384 from model attributes
[2024-07-02 10:46:22,043] [INFO] [engine.py:954:_run_pass] Pass transformers_optimization:OrtTransformersOptimization finished in 3.542157 seconds
[2024-07-02 10:46:22,059] [DEBUG] [engine.py:711:_cache_model] Cached model 1_OrtTransformersOptimization-0-223aa855-gpu-dml to cache\default_workflow\models\1_OrtTransformersOptimization-0-223aa855-gpu-dml.json
[2024-07-02 10:46:22,065] [DEBUG] [engine.py:794:_cache_run] Cached run for 0_OnnxConversion-df880b77-673bf9e9->1_OrtTransformersOptimization-0-223aa855-gpu-dml into cache\default_workflow\runs\OrtTransformersOptimization-0-223aa855-gpu-dml.json
[2024-07-02 10:46:22,067] [INFO] [engine.py:867:_run_pass] Running pass insert_beam_search:InsertBeamSearch
[2024-07-02 10:46:22,069] [DEBUG] [resource_path.py:156:create_resource_path] Resource path C:\Olive-main\examples\whisper\cache\default_workflow\models\1_OrtTransformersOptimization-0-223aa855-gpu-dml\output_model\encoder_decoder_init is inferred to be of type folder.
[2024-07-02 10:46:22,073] [DEBUG] [resource_path.py:156:create_resource_path] Resource path C:\Olive-main\examples\whisper\cache\default_workflow\models\1_OrtTransformersOptimization-0-223aa855-gpu-dml\output_model\decoder is inferred to be of type folder.
[2024-07-02 10:46:22,566] [WARNING] [insert_beam_search.py:280:chain_model] DecoderMaskedMultiHeadAttention could not be applied to whisper decoder subgraph
[2024-07-02 10:46:23,084] [DEBUG] [insert_beam_search.py:302:chain_model] Using IR version 8 for chained model
[2024-07-02 10:46:24,877] [INFO] [engine.py:954:_run_pass] Pass insert_beam_search:InsertBeamSearch finished in 2.810152 seconds
[2024-07-02 10:46:24,880] [DEBUG] [engine.py:711:_cache_model] Cached model 2_InsertBeamSearch-1-e941a2d8 to cache\default_workflow\models\2_InsertBeamSearch-1-e941a2d8.json
[2024-07-02 10:46:24,881] [DEBUG] [engine.py:794:_cache_run] Cached run for 1_OrtTransformersOptimization-0-223aa855-gpu-dml->2_InsertBeamSearch-1-e941a2d8 into cache\default_workflow\runs\InsertBeamSearch-1-e941a2d8.json
[2024-07-02 10:46:24,883] [INFO] [engine.py:867:_run_pass] Running pass prepost:AppendPrePostProcessingOps
[2024-07-02 10:46:24,885] [DEBUG] [resource_path.py:156:create_resource_path] Resource path C:\Olive-main\examples\whisper\cache\default_workflow\models\2_InsertBeamSearch-1-e941a2d8\output_model\model_with_beam_search.onnx is inferred to be of type file.
[2024-07-02 10:46:24,886] [DEBUG] [resource_path.py:156:create_resource_path] Resource path C:\Olive-main\examples\whisper\cache\default_workflow\models\2_InsertBeamSearch-1-e941a2d8\output_model\model_with_beam_search.onnx is inferred to be of type file.
[2024-07-02 10:46:26,135] [INFO] [engine.py:954:_run_pass] Pass prepost:AppendPrePostProcessingOps finished in 1.248883 seconds
[2024-07-02 10:46:26,138] [DEBUG] [engine.py:711:_cache_model] Cached model 3_AppendPrePostProcessingOps-2-9e247843 to cache\default_workflow\models\3_AppendPrePostProcessingOps-2-9e247843.json
[2024-07-02 10:46:26,140] [DEBUG] [engine.py:794:_cache_run] Cached run for 2_InsertBeamSearch-1-e941a2d8->3_AppendPrePostProcessingOps-2-9e247843 into cache\default_workflow\runs\AppendPrePostProcessingOps-2-9e247843.json
[2024-07-02 10:46:26,142] [INFO] [engine.py:845:_run_passes] Run model evaluation for the final model...
[2024-07-02 10:46:26,142] [DEBUG] [engine.py:1026:_evaluate_model] Evaluating model ...
[2024-07-02 10:46:26,142] [DEBUG] [resource_path.py:156:create_resource_path] Resource path C:\Olive-main\examples\whisper\cache\default_workflow\models\3_AppendPrePostProcessingOps-2-9e247843\output_model\model_with_beam_search.onnx is inferred to be of type file.
[2024-07-02 10:46:26,144] [DEBUG] [resource_path.py:156:create_resource_path] Resource path C:\Olive-main\examples\whisper\cache\default_workflow\models\3_AppendPrePostProcessingOps-2-9e247843\output_model\model_with_beam_search.onnx is inferred to be of type file.
[2024-07-02 10:46:26,267] [DEBUG] [resource_path.py:156:create_resource_path] Resource path C:\Olive-main\examples\whisper\data is inferred to be of type folder.
[2024-07-02 10:46:27,265] [DEBUG] [ort_inference.py:72:get_ort_inference_session] inference_settings: {'execution_provider': ['DmlExecutionProvider'], 'provider_options': None}
[2024-07-02 10:46:27,265] [DEBUG] [ort_inference.py:111:get_ort_inference_session] Normalized providers: ['DmlExecutionProvider'], provider_options: [{}]
[2024-07-02 10:46:29,195] [WARNING] [engine.py:360:run_accelerator] Failed to run Olive on gpu-dml.
Traceback (most recent call last):
File "C:\Olive-main\olive\engine\engine.py", line 349, in run_accelerator
output_footprint = self.run_search(
File "C:\Olive-main\olive\engine\engine.py", line 534, in run_search
should_prune, signal, model_ids = self._run_passes(
File "C:\Olive-main\olive\engine\engine.py", line 846, in _run_passes
signal = self._evaluate_model(model_config, model_id, data_root, evaluator_config, accelerator_spec)
File "C:\Olive-main\olive\engine\engine.py", line 1052, in _evaluate_model
signal = self.target.evaluate_model(model_config, data_root, metrics, accelerator_spec)
File "C:\Olive-main\olive\systems\local.py", line 47, in evaluate_model
return evaluator.evaluate(model, data_root, metrics, device=device, execution_providers=execution_providers)
File "C:\Olive-main\olive\evaluator\olive_evaluator.py", line 205, in evaluate
metrics_res[metric.name] = self._evaluate_latency(
File "C:\Olive-main\olive\evaluator\olive_evaluator.py", line 123, in _evaluate_latency
latencies = self._evaluate_raw_latency(
File "C:\Olive-main\olive\evaluator\olive_evaluator.py", line 763, in _evaluate_raw_latency
return self._evaluate_onnx_latency(model, metric, dataloader, post_func, device, execution_providers)
File "C:\Olive-main\olive\evaluator\olive_evaluator.py", line 544, in _evaluate_onnx_latency
latencies = session.time_run(
File "C:\Olive-main\olive\common\ort_inference.py", line 334, in time_run
self.session.run(None, input_feed)
File "C:\anaconda3\envs\whisper-test\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 220, in run
return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running WhisperBeamSearch node. Name:'BeamSearch_node' Status Message: Non-zero status code returned while running Conv node. Name:'/whisper_encoder/encoder/conv1/Conv' Status Message: C:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2557)\onnxruntime_pybind11_state.pyd!00007FFC4E4A2689: (caller: 00007FFC4EBF5261) Exception(3) tid(1305c) 80070057 The parameter is incorrect.

Other information

OS: Windows 11
Olive version: 0.7.0
ONNXRuntime package and version: onnxruntime-directml==1.18.0

Additional context
The end of the ort log file:
2024-07-02 10:46:29.0995099 [V:onnxruntime:, session_state.cc:126 onnxruntime::SessionState::CreateGraphInfo] SaveMLValueNameIndexMapping
2024-07-02 10:46:29.1003795 [V:onnxruntime:, session_state.cc:172 onnxruntime::SessionState::CreateGraphInfo] Done saving OrtValue mappings.
2024-07-02 10:46:29.1007495 [I:onnxruntime:, allocation_planner.cc:2442 onnxruntime::IGraphPartitioner::CreateGraphPartitioner] Use
DeviceBasedPartition as default
2024-07-02 10:46:29.1062344 [I:onnxruntime:, session_state_utils.cc:201 onnxruntime::session_state_utils::SaveInitializedTensors] Saving initialized
tensors.
2024-07-02 10:46:29.1529584 [I:onnxruntime:, session_state_utils.cc:345 onnxruntime::session_state_utils::SaveInitializedTensors] Done saving
initialized tensors
2024-07-02 10:46:29.1543346 [I:onnxruntime:, inference_session.cc:2033 onnxruntime::InferenceSession::Initialize] Session successfully initialized.
[1;31m2024-07-02 10:46:29.2071764 [E:onnxruntime:, sequential_executor.cc:516 onnxruntime::ExecuteKernel] Non-zero status code returned while running
Conv node. Name:'/whisper_encoder/encoder/conv1/Conv' Status Message:
C:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2557)\onnxruntime_pybind11_state.pyd!00007FFC4E4A2689:
(caller: 00007FFC4EBF5261) Exception(3) tid(1305c) 80070057 The parameter is incorrect.

[1;31m2024-07-02 10:46:29.2083647 [E:onnxruntime:, sequential_executor.cc:516 onnxruntime::ExecuteKernel] Non-zero status code returned while running
WhisperBeamSearch node. Name:'BeamSearch_node' Status Message: Non-zero status code returned while running Conv node.
Name:'/whisper_encoder/encoder/conv1/Conv' Status Message:
C:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2557)\onnxruntime_pybind11_state.pyd!00007FFC4E4A2689:
(caller: 00007FFC4EBF5261) Exception(3) tid(1305c) 80070057 The parameter is incorrect.

Checked the line reporting the error:
https://github.com/microsoft/onnxruntime/blob/7be1d4aad3f984ebe2c4fb0f7db0b9ca67cc8964/onnxruntime/core/providers/dml/DmlExecutionProvider/src/MLOperatorAuthorImpl.cpp#L2557

If i remove the optimization "enable_skip_layer_norm": false or set "use_multi_head_attention" to false, I get the previously reported error in #1213 (comment), and if I set "use_gpu": false for InsertBeamSearch, the run fails silently and aborts after showing the message "[I:onnxruntime:, session_state_utils.cc:345 onnxruntime::session_state_utils::SaveInitializedTensors] Done saving initialized tensors" and does not show "[I:onnxruntime:, inference_session.cc:2033 onnxruntime::InferenceSession::Initialize] Session successfully initialized." in the ort output log.