microsoft/Olive

whisper transcriptions is empty

thewh1teagle opened this issue · 2 comments

Describe the bug
I followed the whisper example for optimizing whisper for single onnx model in this repository
But when running the test transcription script it shows empty result.

To Reproduce
Commit 7b4cefe

git clone https://github.com/microsoft/Olive
cd Olive
pip install .
cd examples/whisper
pip install -r requirements.txt
python prepare_whisper_configs.py --model_name openai/whisper-tiny.en
python -m olive run --config whisper_cpu_int8.json --setup
python -m olive run --config whisper_cpu_int8.json
python test_transcription.py --config whisper_cpu_int8.json

Expected behavior
It should show the transcription

Olive config
Commit 7b4cefe

Olive logs

logs
python3 prepare_whisper_configs.py --model_name openai/whisper-tiny.en
/Users/user/Library/Python/3.9/lib/python/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(whisper git:(main) python3 -m olive run --config whisper_cpu_int8.json --setup
/Users/user/Library/Python/3.9/lib/python/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(
[2024-08-09 20:33:57,113] [INFO] [run.py:90:get_required_packages] The following packages are required in the local environment: ['onnxruntime']
[2024-08-09 20:33:57,114] [INFO] [run.py:101:install_packages] installing packages: ['onnxruntime']
[2024-08-09 20:33:57,185] [INFO] [run.py:359:check_local_ort_installation] onnxruntime is already installed.whisper git:(main) python3 -m olive run --config whisper_cpu_int8.json
/Users/user/Library/Python/3.9/lib/python/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(
[2024-08-09 20:34:02,335] [INFO] [run.py:138:run_engine] Running workflow default_workflow
[2024-08-09 20:34:02,353] [INFO] [cache.py:51:__init__] Using cache directory: /Volumes/Internal/Olive/examples/whisper/cache/default_workflow
[2024-08-09 20:34:02,355] [INFO] [engine.py:1020:save_olive_config] Saved Olive config to /Volumes/Internal/Olive/examples/whisper/cache/default_workflow/olive_config.json
[2024-08-09 20:34:02,355] [DEBUG] [run.py:182:run_engine] Registering pass onnxconversion
[2024-08-09 20:34:02,357] [DEBUG] [run.py:182:run_engine] Registering pass orttransformersoptimization
[2024-08-09 20:34:02,357] [DEBUG] [run.py:182:run_engine] Registering pass onnxdynamicquantization
[2024-08-09 20:34:02,358] [DEBUG] [run.py:182:run_engine] Registering pass insertbeamsearch
[2024-08-09 20:34:02,359] [DEBUG] [run.py:182:run_engine] Registering pass appendprepostprocessingops
[2024-08-09 20:34:02,360] [DEBUG] [accelerator_creator.py:130:_fill_accelerators] The accelerator device and execution providers are specified, skipping deduce.
[2024-08-09 20:34:02,360] [DEBUG] [accelerator_creator.py:169:_check_execution_providers] Supported execution providers for device cpu: ['CPUExecutionProvider']
[2024-08-09 20:34:02,360] [DEBUG] [accelerator_creator.py:199:create_accelerators] Initial accelerators and execution providers: {'cpu': ['CPUExecutionProvider']}
[2024-08-09 20:34:02,360] [INFO] [accelerator_creator.py:224:create_accelerators] Running workflow on accelerator specs: cpu-cpu
[2024-08-09 20:34:02,360] [DEBUG] [run.py:238:run_engine] Pass onnxconversion already registered
[2024-08-09 20:34:02,360] [DEBUG] [run.py:238:run_engine] Pass orttransformersoptimization already registered
[2024-08-09 20:34:02,360] [DEBUG] [run.py:238:run_engine] Pass onnxdynamicquantization already registered
[2024-08-09 20:34:02,360] [DEBUG] [run.py:238:run_engine] Pass insertbeamsearch already registered
[2024-08-09 20:34:02,360] [DEBUG] [run.py:238:run_engine] Pass appendprepostprocessingops already registered
[2024-08-09 20:34:02,360] [DEBUG] [cache.py:304:set_cache_env] Set OLIVE_CACHE_DIR: /Volumes/Internal/Olive/examples/whisper/cache/default_workflow
[2024-08-09 20:34:02,370] [INFO] [engine.py:277:run] Running Olive on accelerator: cpu-cpu
[2024-08-09 20:34:02,370] [INFO] [engine.py:1117:_create_system] Creating target system ...
[2024-08-09 20:34:02,370] [DEBUG] [engine.py:1113:create_system] create native OliveSystem SystemType.Local
[2024-08-09 20:34:02,370] [INFO] [engine.py:1120:_create_system] Target system created in 0.000116 seconds
[2024-08-09 20:34:02,370] [INFO] [engine.py:1129:_create_system] Creating host system ...
[2024-08-09 20:34:02,370] [DEBUG] [engine.py:1113:create_system] create native OliveSystem SystemType.Local
[2024-08-09 20:34:02,370] [INFO] [engine.py:1132:_create_system] Host system created in 0.000052 seconds
[2024-08-09 20:34:02,400] [DEBUG] [engine.py:717:_cache_model] Cached model c07e4eb4 to /Volumes/Internal/Olive/examples/whisper/cache/default_workflow/models/c07e4eb4.json
[2024-08-09 20:34:02,400] [DEBUG] [engine.py:352:run_accelerator] Running Olive in no-search mode ...
[2024-08-09 20:34:02,400] [DEBUG] [engine.py:444:run_no_search] Running ['conversion', 'transformers_optimization', 'onnx_dynamic_quantization', 'insert_beam_search', 'prepost'] with no search ...
[2024-08-09 20:34:02,400] [INFO] [engine.py:886:_run_pass] Running pass conversion:OnnxConversion
[2024-08-09 20:34:03,235] [DEBUG] [pytorch.py:194:get_dummy_inputs] Using dummy_inputs_func to get dummy inputs
Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.43.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.
[2024-08-09 20:34:03,943] [DEBUG] [conversion.py:198:_export_pytorch_model] Converting model on device cpu with dtype None.
/Users/user/Library/Python/3.9/lib/python/site-packages/transformers/models/whisper/modeling_whisper.py:1070: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if input_features.shape[-1] != expected_seq_length:
/Users/user/Library/Python/3.9/lib/python/site-packages/transformers/models/whisper/modeling_whisper.py:387: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz, self.num_heads, tgt_len, self.head_dim):
/Users/user/Library/Python/3.9/lib/python/site-packages/transformers/models/whisper/modeling_whisper.py:100: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if sequence_length != 1:
[2024-08-09 20:34:06,237] [DEBUG] [pytorch.py:194:get_dummy_inputs] Using dummy_inputs_func to get dummy inputs
[2024-08-09 20:34:06,692] [DEBUG] [conversion.py:198:_export_pytorch_model] Converting model on device cpu with dtype None.
[2024-08-09 20:34:07,706] [INFO] [engine.py:988:_run_pass] Pass conversion:OnnxConversion finished in 5.305490 seconds
[2024-08-09 20:34:07,706] [DEBUG] [engine.py:717:_cache_model] Cached model 5_OnnxConversion-c07e4eb4-5fa0d4af to /Volumes/Internal/Olive/examples/whisper/cache/default_workflow/models/5_OnnxConversion-c07e4eb4-5fa0d4af.json
[2024-08-09 20:34:07,707] [DEBUG] [engine.py:769:_cache_run] Cached run for c07e4eb4->5_OnnxConversion-c07e4eb4-5fa0d4af into /Volumes/Internal/Olive/examples/whisper/cache/default_workflow/runs/OnnxConversion-c07e4eb4-5fa0d4af.json
[2024-08-09 20:34:07,707] [INFO] [engine.py:886:_run_pass] Running pass transformers_optimization:OrtTransformersOptimization
[2024-08-09 20:34:07,740] [DEBUG] [transformer_optimization.py:248:_run_for_config] model_type is set to bart from model attributes
[2024-08-09 20:34:07,740] [DEBUG] [transformer_optimization.py:254:_run_for_config] num_heads is set to 6 from model attributes
[2024-08-09 20:34:07,740] [DEBUG] [transformer_optimization.py:260:_run_for_config] hidden_size is set to 384 from model attributes
[2024-08-09 20:34:11,279] [DEBUG] [transformer_optimization.py:248:_run_for_config] model_type is set to bart from model attributes
[2024-08-09 20:34:11,279] [DEBUG] [transformer_optimization.py:254:_run_for_config] num_heads is set to 6 from model attributes
[2024-08-09 20:34:11,279] [DEBUG] [transformer_optimization.py:260:_run_for_config] hidden_size is set to 384 from model attributes
Unable to determine if Range_10_o0__d0 + past_decode_sequence_length <= past_decode_sequence_length + 2, treat as equal
Unable to determine if Range_10_o0__d0 + past_decode_sequence_length <= past_decode_sequence_length + 2, treat as equal
Unable to determine if Range_10_o0__d0 + past_decode_sequence_length <= past_decode_sequence_length + 2, treat as equal
Unable to determine if Range_10_o0__d0 + past_decode_sequence_length <= past_decode_sequence_length + 2, treat as equal
Unable to determine if Range_10_o0__d0 + past_decode_sequence_length <= past_decode_sequence_length + 2, treat as equal
[2024-08-09 20:34:13,583] [INFO] [engine.py:988:_run_pass] Pass transformers_optimization:OrtTransformersOptimization finished in 5.874750 seconds
[2024-08-09 20:34:13,583] [DEBUG] [engine.py:717:_cache_model] Cached model 6_OrtTransformersOptimization-5-5c93fa9e-cpu-cpu to /Volumes/Internal/Olive/examples/whisper/cache/default_workflow/models/6_OrtTransformersOptimization-5-5c93fa9e-cpu-cpu.json
[2024-08-09 20:34:13,583] [DEBUG] [engine.py:769:_cache_run] Cached run for 5_OnnxConversion-c07e4eb4-5fa0d4af->6_OrtTransformersOptimization-5-5c93fa9e-cpu-cpu into /Volumes/Internal/Olive/examples/whisper/cache/default_workflow/runs/OrtTransformersOptimization-5-5c93fa9e-cpu-cpu.json
[2024-08-09 20:34:13,584] [INFO] [engine.py:886:_run_pass] Running pass onnx_dynamic_quantization:OnnxDynamicQuantization
[2024-08-09 20:34:13,613] [INFO] [quantization.py:391:_run_for_config] Preprocessing model for quantization
[2024-08-09 20:34:15,076] [INFO] [quantization.py:391:_run_for_config] Preprocessing model for quantization
[2024-08-09 20:34:16,182] [INFO] [engine.py:988:_run_pass] Pass onnx_dynamic_quantization:OnnxDynamicQuantization finished in 2.597045 seconds
[2024-08-09 20:34:16,182] [DEBUG] [engine.py:717:_cache_model] Cached model 7_OnnxDynamicQuantization-6-a1261e22 to /Volumes/Internal/Olive/examples/whisper/cache/default_workflow/models/7_OnnxDynamicQuantization-6-a1261e22.json
[2024-08-09 20:34:16,182] [DEBUG] [engine.py:769:_cache_run] Cached run for 6_OrtTransformersOptimization-5-5c93fa9e-cpu-cpu->7_OnnxDynamicQuantization-6-a1261e22 into /Volumes/Internal/Olive/examples/whisper/cache/default_workflow/runs/OnnxDynamicQuantization-6-a1261e22.json
[2024-08-09 20:34:16,183] [INFO] [engine.py:886:_run_pass] Running pass insert_beam_search:InsertBeamSearch
Removed 67 initializers with duplicated value
Removed 33 initializers with duplicated value
[2024-08-09 20:34:17,543] [DEBUG] [insert_beam_search.py:302:chain_model] Using IR version 8 for chained model
[2024-08-09 20:34:17,887] [INFO] [engine.py:988:_run_pass] Pass insert_beam_search:InsertBeamSearch finished in 1.703919 seconds
[2024-08-09 20:34:17,888] [DEBUG] [engine.py:717:_cache_model] Cached model 8_InsertBeamSearch-7-82bf64f8 to /Volumes/Internal/Olive/examples/whisper/cache/default_workflow/models/8_InsertBeamSearch-7-82bf64f8.json
[2024-08-09 20:34:17,888] [DEBUG] [engine.py:769:_cache_run] Cached run for 7_OnnxDynamicQuantization-6-a1261e22->8_InsertBeamSearch-7-82bf64f8 into /Volumes/Internal/Olive/examples/whisper/cache/default_workflow/runs/InsertBeamSearch-7-82bf64f8.json
[2024-08-09 20:34:17,889] [INFO] [engine.py:886:_run_pass] Running pass prepost:AppendPrePostProcessingOps
[W809 20:34:18.283838000 shape_type_inference.cpp:2002] Warning: The shape inference of ai.onnx.contrib::StftNorm type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable)
[2024-08-09 20:34:19,297] [INFO] [engine.py:988:_run_pass] Pass prepost:AppendPrePostProcessingOps finished in 1.408222 seconds
[2024-08-09 20:34:19,298] [DEBUG] [engine.py:717:_cache_model] Cached model 9_AppendPrePostProcessingOps-8-9e247843 to /Volumes/Internal/Olive/examples/whisper/cache/default_workflow/models/9_AppendPrePostProcessingOps-8-9e247843.json
[2024-08-09 20:34:19,298] [DEBUG] [engine.py:769:_cache_run] Cached run for 8_InsertBeamSearch-7-82bf64f8->9_AppendPrePostProcessingOps-8-9e247843 into /Volumes/Internal/Olive/examples/whisper/cache/default_workflow/runs/AppendPrePostProcessingOps-8-9e247843.json
[2024-08-09 20:34:19,298] [INFO] [engine.py:862:_run_passes] Run model evaluation for the final model...
[2024-08-09 20:34:19,298] [DEBUG] [engine.py:1059:_evaluate_model] Evaluating model ...
[2024-08-09 20:34:20,978] [DEBUG] [ort_inference.py:72:get_ort_inference_session] inference_settings: {'execution_provider': ['CPUExecutionProvider'], 'provider_options': None}
[2024-08-09 20:34:20,978] [DEBUG] [ort_inference.py:111:get_ort_inference_session] Normalized providers: ['CPUExecutionProvider'], provider_options: [{}]
[2024-08-09 20:34:53,971] [DEBUG] [footprint.py:234:_resolve_metrics] There is no goal set for metric: latency-avg.
[2024-08-09 20:34:53,971] [DEBUG] [engine.py:864:_run_passes] Signal: {
  "latency-avg": 1086.67319
}
[2024-08-09 20:34:53,992] [INFO] [engine.py:378:run_accelerator] Save footprint to models/whisper_cpu_int8_cpu-cpu_footprints.json.
[2024-08-09 20:34:53,996] [DEBUG] [engine.py:380:run_accelerator] run_accelerator done
[2024-08-09 20:34:53,996] [INFO] [engine.py:294:run] Run history for cpu-cpu:
[2024-08-09 20:34:54,004] [INFO] [engine.py:591:dump_run_history] run history:
+--------------------------------------------------+--------------------------------------------------+-----------------------------+----------------+-----------------------------+
| model_id                                         | parent_model_id                                  | from_pass                   |   duration_sec | metrics                     |
+==================================================+==================================================+=============================+================+=============================+
| c07e4eb4                                         |                                                  |                             |                |                             |
+--------------------------------------------------+--------------------------------------------------+-----------------------------+----------------+-----------------------------+
| 5_OnnxConversion-c07e4eb4-5fa0d4af               | c07e4eb4                                         | OnnxConversion              |        5.30549 |                             |
+--------------------------------------------------+--------------------------------------------------+-----------------------------+----------------+-----------------------------+
| 6_OrtTransformersOptimization-5-5c93fa9e-cpu-cpu | 5_OnnxConversion-c07e4eb4-5fa0d4af               | OrtTransformersOptimization |        5.87475 |                             |
+--------------------------------------------------+--------------------------------------------------+-----------------------------+----------------+-----------------------------+
| 7_OnnxDynamicQuantization-6-a1261e22             | 6_OrtTransformersOptimization-5-5c93fa9e-cpu-cpu | OnnxDynamicQuantization     |        2.59704 |                             |
+--------------------------------------------------+--------------------------------------------------+-----------------------------+----------------+-----------------------------+
| 8_InsertBeamSearch-7-82bf64f8                    | 7_OnnxDynamicQuantization-6-a1261e22             | InsertBeamSearch            |        1.70392 |                             |
+--------------------------------------------------+--------------------------------------------------+-----------------------------+----------------+-----------------------------+
| 9_AppendPrePostProcessingOps-8-9e247843          | 8_InsertBeamSearch-7-82bf64f8                    | AppendPrePostProcessingOps  |        1.40822 | {                           |
|                                                  |                                                  |                             |                |   "latency-avg": 1086.67319 |
|                                                  |                                                  |                             |                | }                           |
+--------------------------------------------------+--------------------------------------------------+-----------------------------+----------------+-----------------------------+
[2024-08-09 20:34:54,005] [INFO] [engine.py:309:run] No packaging config provided, skip packaging artifactswhisper git:(main) python3 test_transcription.py --config whisper_cpu_int8.json
/Users/user/Library/Python/3.9/lib/python/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(
['']

Other information

  • OS: macOS
  • Olive version: 0.7.0 Commit 7b4cefe
  • ONNXRuntime package and version: onnx==1.16.2

Additional context
Add any other context about the problem here.

By the way the tests doesn't check that

transcription = test_transcription(["--config", config_file])
assert len(transcription) > 0

It will pass even when no tokens produced:

len([''])
1

Hi, thanks for catching and reporting this issue. The model exported using transformers>=4.33.0 is not compatible with the workflow anymore. Can you clean the cache directory, install transformers<4.33.0 and rerun the workflow?

I can confirm that this fixed the issue.
How could I debug it better next time?