microsoft/Olive

status.IsOK() was false. Tensor shape cannot contain any negative value

MrRace opened this issue · 1 comments

Describe the bug
Following the instructions in https://github.com/microsoft/Olive/tree/main/examples/whisper, I carried out the following steps for model optimization:

python3 prepare_whisper_configs.py --model_name /share_model_zoo/LLM/openai/whisper-base/
python3 -m olive.workflows.run --config whisper_cpu_fp32.json --setup
python3 -m olive.workflows.run --config whisper_cpu_fp32.json
  1. Test the optimized ONNX model with test_transcription.py:
python3 test_transcription.py --config whisper_cpu_fp32.json --audio_path ./eng_2min.wav

Running the above code results in the following error:

onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running ConstantOfShape node. Name:'/ConstantOfShape' Status Message: /onnxruntime_src/onnxruntime/core/framework/op_kernel.cc:83 virtual OrtValue* onnxruntime::OpKernelContext::OutputMLValue(int, const onnxruntime::TensorShape&) status.IsOK() was false. Tensor shape cannot contain any negative value

I have uploaded the eng_2min.wav file as an attachment.

To Reproduce
Steps to reproduce the behavior.

Expected behavior
A clear and concise description of what you expected to happen.

Olive config
whisper_cpu_fp32.json

{
    "input_model": {
        "type": "PyTorchModel",
        "config": {
            "model_script": "code/user_script.py",
            "script_dir": "code",
            "hf_config": {
                "model_class": "WhisperForConditionalGeneration",
                "model_name": "/share_model_zoo/LLM/openai/whisper-base/",
                "components": [
                    {
                        "name": "encoder_decoder_init",
                        "io_config": "get_encdec_io_config",
                        "component_func": "get_encoder_decoder_init",
                        "dummy_inputs_func": "encoder_decoder_init_dummy_inputs"
                    },
                    {
                        "name": "decoder",
                        "io_config": "get_dec_io_config",
                        "component_func": "get_decoder",
                        "dummy_inputs_func": "decoder_dummy_inputs"
                    }
                ],
                "from_pretrained_args": {
                    "attn_implementation": "eager"
                }
            }
        }
    },
    "systems": {
        "local_system": {
            "type": "LocalSystem",
            "config": {
                "accelerators": [
                    {
                        "device": "cpu",
                        "execution_providers": [
                            "CPUExecutionProvider"
                        ]
                    }
                ]
            }
        }
    },
    "evaluators": {
        "common_evaluator": {
            "metrics": [
                {
                    "name": "latency",
                    "type": "latency",
                    "sub_types": [
                        {
                            "name": "avg",
                            "priority": 1
                        }
                    ],
                    "user_config": {
                        "user_script": "code/user_script.py",
                        "script_dir": "code",
                        "data_dir": "data",
                        "dataloader_func": "whisper_dataloader",
                        "func_kwargs": {
                            "dataloader_func": {
                                "model_name": "/share_model_zoo/LLM/openai/whisper-base/",
                                "use_audio_decoder": true
                            }
                        }
                    }
                }
            ]
        }
    },
    "passes": {
        "conversion": {
            "type": "OnnxConversion",
            "config": {
                "target_opset": 17
            }
        },
        "transformers_optimization": {
            "type": "OrtTransformersOptimization",
            "config": {
                "optimization_options": {
                    "use_multi_head_attention": true
                },
                "use_gpu": false
            }
        },
        "insert_beam_search": {
            "type": "InsertBeamSearch",
            "config": {
                "use_forced_decoder_ids": false,
                "use_logits_processor": false,
                "fp16": false
            }
        },
        "prepost": {
            "type": "AppendPrePostProcessingOps",
            "config": {
                "tool_command": "whisper",
                "tool_command_args": {
                    "model_name": "/share_model_zoo/LLM/openai/whisper-base/",
                    "use_audio_decoder": true
                },
                "target_opset": 17
            }
        }
    },
    "engine": {
        "log_severity_level": 0,
        "host": "local_system",
        "target": "local_system",
        "evaluator": "common_evaluator",
        "evaluate_input_model": false,
        "clean_cache": false,
        "cache_dir": "cache",
        "output_dir": "models",
        "output_name": "whisper_cpu_fp32"
    }
}

Olive logs
Add logs here.

Other information

  • OS: [e.g. Windows, Linux]: Ubuntu 22.04.3 LTS
  • Olive version: [e.g. 0.4.0 or main]: olive-ai Version=0.6.0
  • ONNXRuntime package and version: onnxruntime-gpu Version=1.17.1

Additional context
Add any other context about the problem here.
eng_2min.zip

2 minutes is too long for whisper. It only supports up to 30 seconds. I clipped your example audio to 25 seconds and it worked fine. You will have to divide the audio into acceptable lengths and provide to the model. Please refer to https://github.com/openai/whisper/blob/main/whisper/transcribe.py for how to process long files.