swoook/KoBART

Request a feature to export KoBART for sequence classification to ONNX Runtime (ORT)

Closed this issue Β· 8 comments

πŸš€ Feature request

  • I'd like to implement a feature to export KoBART to ONNX Runtime
  • Of course, transformers officially supports exporting to ONNX Runtime [here]
  • However, I found the dependency conflict below:
  1. SKT-AI/KoBART requires transformers==4.3.3
  2. transformers>=4.9.0 supports exporting BART to ONNX
  • I'd like to address this conflict to export KoBART to ONNX Runtime

Motivation

  • I'd like to optimize the computational resources of KoBART
  • The toolkits or frameworks I can choose:
  1. TensorRT (TRT)
  2. ONNX Runtime (ORT)
  3. OpenVINO
Highlights Which one is better or worse?
performance TRT β‰₯ ORT
ORT are sometimes on par with TRT
hardware TRT ≀ ORT
TRT only supports NVIDIA GPU
ORT supports NVIDIA GPU and Intel CPU
I cannot find any document about AMD for ORT πŸ™ˆ
compatibility TRT << ORT
TRT performs device-specific optimizations [1, 2]
For example, an execution engine built for a Nvidia A100 GPU will not work on a Nvidia T4 GPU πŸ™ƒ
difficulty TRT β‰₯ ORT
  • MichaΓ«l Benesty compares two methodologies, ORT and TRT
  • And transformers officially supports exporting to ONNX Runtime [here]

Your contribution

  • SKT-AI/KoBART requires transformers==4.3.3
  • In transformers==4.3.3, transformers.convert_graph_to_onnx supports exporting to ONNX
  • --help for transformers.convert_graph_to_onnx:
$ python -m transformers.convert_graph_to_onnx --help        usage: ONNX Converter [-h]
                      [--pipeline {feature-extraction,ner,sentiment-analysis,fill-mask,question-answering,text-generation,translation_en_to_fr,translation_en_to_de,translation_en_to_ro}]
                      --model MODEL [--tokenizer TOKENIZER] [--framework {pt,tf}] [--opset OPSET] [--check-loading] [--use-external-format]
                      [--quantize]
                      output

positional arguments:
  output

optional arguments:
  -h, --help            show this help message and exit
  --pipeline {feature-extraction,ner,sentiment-analysis,fill-mask,question-answering,text-generation,translation_en_to_fr,translation_en_to_de,translation_en_to_ro}
  --model MODEL         Model's id or path (ex: bert-base-cased)
  --tokenizer TOKENIZER
                        Tokenizer's id or path (ex: bert-base-cased)
  --framework {pt,tf}   Framework for loading the model
  --opset OPSET         ONNX opset to use
  --check-loading       Check ONNX is able to load the model
  --use-external-format
                        Allow exporting model >= than 2Gb
  --quantize            Quantize the neural network to be run with int8
  1. --model:

    • Hugging Face saves model into two files:
    1. config.json which saves the configuration of your model
    2. pytorch_model.bin which is the PyTorch checkpoint
    • We can pass the directory in which they exist
    • Or it also accepts model's id
    • For example, the model's id of this model is skt/kobert-base-v1
  2. --framework

    • --framework p For PyTorch
    • --framework t For TensorFlow
  • Recall the examples of $REPO_ROOT/examples in SKT-AI/KoBART
  • They use pytorch-lightning
  • And pytorch-lightning saves the model into .ckpt and .yaml file
  • However, transformers.convert_graph_to_onnx does NOT support such format
  • We can reverse them to the format of Hugging Face:
from dmp_kobart import KoBARTClassification

paths = dict()
paths['ckpt'] = $CKPT_PATH
paths['yaml'] = $YAML_PATH
paths['huggingface'] = $OUTPUT_DIR

pytorch_lightning_model_wrapper = KoBARTClassification.load_from_checkpoint(
    checkpoint_path=paths['ckpt'],
    hparams_file=paths['yaml'],
    map_location=None,
)

pytorch_lightning_model_wrapper.model.save_pretrained(paths['huggingface'])
$ python -m transformers.convert_graph_to_onnx --framework pt \\
--model $MODEL_DIR \\
$ONNX_PATH
  • Got the Error message Error while converting the model: Can't load tokenizer for $TOKENIZER_PATH_OR_DIR 😱
====== Converting model to ONNX ======
ONNX opset version set to: 11
Loading pipeline (model: $MODEL_DIR, tokenizer: $MODEL_DIR)
Error while converting the model: Can't load tokenizer for '$MODEL_DIR'. Make sure that:

- '$MODEL_DIR' is a correct model identifier listed on '<https://huggingface.co/models>'

- or '$MODEL_DIR' is the correct path to a directory containing relevant tokenizer files
...
tokenizer = {
    'url':
    '<https://kobert.blob.core.windows.net/models/kobart/kobart_base_tokenizer_cased_cf74400bce.zip>',
    'fname': 'kobart_base_tokenizer_cased_cf74400bce.zip',
    'chksum': 'cf74400bce'
}
...
  • However, it still shows the same error message:
$ python -m transformers.convert_graph_to_onnx --framework pt \\
--model $MODEL_DIR \\
--tokenizer $TOKENIZER_DIR \\
$ONNX_PATH

====== Converting model to ONNX ======
ONNX opset version set to: 11
Loading pipeline (model: $MODEL_DIR, tokenizer: $TOKENIZER_DIR)
Error while converting the model: Can't load tokenizer for '$TOKENIZER_DIR'. Make sure that:

- '$TOKENIZER_DIR' is a correct model identifier listed on '<https://huggingface.co/models>'

- or '$TOKENIZER_DIR' is the correct path to a directory containing relevant tokenizer files
  1. added_tokens.json [example]
  2. special_tokens_map.json [example]
  3. tokenizer_config.json [example]
  4. tokenizer.json [example]
...
additional_files_names = {
                    "added_tokens_file": ADDED_TOKENS_FILE,
                    "special_tokens_map_file": SPECIAL_TOKENS_MAP_FILE,
                    "tokenizer_config_file": TOKENIZER_CONFIG_FILE,
                    "tokenizer_file": FULL_TOKENIZER_FILE,
                }
...
  • kobart_base_tokenizer_cased_cf74400bce.zip includes:
  1. model.json

    • It has the keys which also exist in the example of tokenizer.json
    • I.e. It seems model.json is tokenizer.json
  2. emji_tokenizer-vocab.json

    • vocab.json
    • However, model.json also include vocab
  • So, it doesn't have:
  1. added_tokens.json [example]
  2. tokenizer_config.json [example]
  3. special_tokens_map.json [example]
  • 1 is optional, but 2 and 3 is required
  • Got the Error message Error while converting the model: The type of axis index is expected to be an integer 😱
====== Converting model to ONNX ======
ONNX opset version set to: 11
Loading pipeline (model: $MODEL_DIR, tokenizer: $TOKENIZER_DIR)
Using framework PyTorch: 1.7.1
Found input input_ids with shape: {0: 'batch', 1: 'sequence'}
Found input attention_mask with shape: {0: 'batch', 1: 'sequence'}
Found output output_0 with shape: {0: 'batch', 1: 'sequence'}
Found output output_1 with shape: {0: 'batch', 2: 'sequence'}
Found output output_1 with shape: {0: 'batch', 2: 'sequence'}
Found output output_1 with shape: {0: 'batch', 2: 'sequence'}
Found output output_1 with shape: {0: 'batch', 2: 'sequence'}
Found output output_2 with shape: {0: 'batch', 2: 'sequence'}
Found output output_2 with shape: {0: 'batch', 2: 'sequence'}
Found output output_2 with shape: {0: 'batch', 2: 'sequence'}
Found output output_2 with shape: {0: 'batch', 2: 'sequence'}
Found output output_3 with shape: {0: 'batch', 2: 'sequence'}
Found output output_3 with shape: {0: 'batch', 2: 'sequence'}
Found output output_3 with shape: {0: 'batch', 2: 'sequence'}
Found output output_3 with shape: {0: 'batch', 2: 'sequence'}
Found output output_4 with shape: {0: 'batch', 2: 'sequence'}
Found output output_4 with shape: {0: 'batch', 2: 'sequence'}
Found output output_4 with shape: {0: 'batch', 2: 'sequence'}
Found output output_4 with shape: {0: 'batch', 2: 'sequence'}
Found output output_5 with shape: {0: 'batch', 2: 'sequence'}
Found output output_5 with shape: {0: 'batch', 2: 'sequence'}
Found output output_5 with shape: {0: 'batch', 2: 'sequence'}
Found output output_5 with shape: {0: 'batch', 2: 'sequence'}
Found output output_6 with shape: {0: 'batch', 2: 'sequence'}
Found output output_6 with shape: {0: 'batch', 2: 'sequence'}
Found output output_6 with shape: {0: 'batch', 2: 'sequence'}
Found output output_6 with shape: {0: 'batch', 2: 'sequence'}
Found output output_7 with shape: {0: 'batch', 1: 'sequence'}
Ensuring inputs are in correct order
decoder_input_ids is not present in the generated input list.
Generated inputs order: ['input_ids', 'attention_mask']
/data/swook/miniconda3/envs/transformers/lib/python3.8/site-packages/torch/onnx/utils.py:1111: UserWarning: No names were found for specified dynamic axes of provided input.Automatically generated names will be applied to each dynamic axes of input output_1
  warnings.warn('No names were found for specified dynamic axes of provided input.'
Error while converting the model: The type of axis index is expected to be an integer
  • I've found relevant issues and pull requests:
  1. #9803 in huggingface/transformers
  2. #11786 in huggingface/transformers
  • It seems they fix this issue in latest version
  • I.e. There is a dependency conflict:
  1. SKT-AI/KoBART requires transformers==4.3.3
  2. transformers should be >=4.9.0 to export BART to ONNX
  • Before we start, I'd like to make sure it can export BART to ONNX
  • Let's try to export some BARTs to ONNX by using transformers==4.12.5:
  1. facebook/bart-base
  2. ynie/bart-large-snli_mnli_fever_anli_R1_R2_R3-nli
$ python -m transformers.onnx \\
--model $MODEL_DIR \\
$ONNX_PATH
  • Got the error message like:
Traceback (most recent call last):
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1359, in from_pretrain
ed
    state_dict = torch.load(resolved_archive_file, map_location="cpu")
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/torch/serialization.py", line 595, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/torch/serialization.py", line 764, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'v'.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/onnx/__main__.py", line 71, in <module>
    main()
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/onnx/__main__.py", line 51, in main
    model = FeaturesManager.get_model_from_feature(args.feature, args.model)
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/onnx/features.py", line 125, in get_model_from_
feature
    return FeaturesManager._TASKS_TO_AUTOMODELS[task].from_pretrained(model)
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 419, in from_pretrained
    return model_class.from_pretrained(pretrained_model_name_or_path, *model_args, config=config, **kwargs)
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1364, in from_pretrained
    raise OSError(
OSError: You seem to have cloned a repository without having git-lfs installed. Please install git-lfs and run `git lfs install` followed by `git lfs pull` in the folder you cloned.
  • We need to install git-lfs [details]
  • Got the error message like:
Traceback (most recent call last):
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/onnx/__main__.py", line 71, in <module>
    main()
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/onnx/__main__.py", line 62, in main
    onnx_inputs, onnx_outputs = export(tokenizer, model, onnx_config, args.opset, args.output)
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/onnx/convert.py", line 90, in export
    raise AssertionError(f"Unsupported PyTorch version, minimum required is 1.8.0, got: {torch_version}")
AssertionError: Unsupported PyTorch version, minimum required is 1.8.0, got: 1.7.1
  • transformers v4.12.5 requires pytorchβ‰₯1.1.0
  • However, transformers.onnx requires pytorchβ‰₯1.8.0 πŸ’’
  • Got multiple warning and the error message like:
Using framework PyTorch: 1.10.0
Overriding 1 configuration item(s)
        - use_cache -> False
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/torch/onnx/utils.py:90: UserWarning: 'enable_onnx_checker' is deprec
ated and ignored. It will be removed in the next PyTorch release. To proceed despite ONNX checker failures, catch torch.onnx.ONNXCheckerError.
  warnings.warn("'enable_onnx_checker' is deprecated and ignored. It will be removed in "
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/torch/onnx/utils.py:103: UserWarning: `use_external_data_format' is 
deprecated and ignored. Will be removed in next PyTorch release. The code will work as it is False if models are not larger than 2GB, Otherwise 
set to False because of size limits imposed by Protocol Buffers.
  warnings.warn("`use_external_data_format' is deprecated and ignored. Will be removed in next "
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py:215: TracerWarning: Conver
ting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be t
reated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py:221: TracerWarning: Conver
ting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be t
reated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attention_mask.size() != (bsz, 1, tgt_len, src_len):
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py:252: TracerWarning: Conver
ting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be t
reated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py:879: TracerWarning: Conver
ting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be t
reated as a constant in the future. This means that the trace might not generalize to other inputs!
  if input_shape[-1] > 1:
Traceback (most recent call last):
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/onnx/__main__.py", line 71, in <module>
    main()
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/onnx/__main__.py", line 64, in main
    validate_model_outputs(onnx_config, tokenizer, model, args.output, onnx_outputs, args.atol)
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/onnx/convert.py", line 142, in validate_model_outputs
    from onnxruntime import InferenceSession, SessionOptions
ModuleNotFoundError: No module named 'onnxruntime'
  • We need to install onnxruntime or onnxruntime-gpu [details]
  • Got the message like:
Using framework PyTorch: 1.10.0
Overriding 1 configuration item(s)
        - use_cache -> False
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/torch/onnx/utils.py:90: UserWarning: 'enable_onnx_checker' is deprec
ated and ignored. It will be removed in the next PyTorch release. To proceed despite ONNX checker failures, catch torch.onnx.ONNXCheckerError.
  warnings.warn("'enable_onnx_checker' is deprecated and ignored. It will be removed in "
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/torch/onnx/utils.py:103: UserWarning: `use_external_data_format' is 
deprecated and ignored. Will be removed in next PyTorch release. The code will work as it is False if models are not larger than 2GB, Otherwise 
set to False because of size limits imposed by Protocol Buffers.
  warnings.warn("`use_external_data_format' is deprecated and ignored. Will be removed in next "
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py:215: TracerWarning: Conver
ting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be t
reated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py:221: TracerWarning: Conver
ting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be t
reated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attention_mask.size() != (bsz, 1, tgt_len, src_len):
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py:252: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py:879: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if input_shape[-1] > 1:
Validating ONNX model...
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:350: UserWarning: Deprecation warning. This ORT build has ['CUDAExecutionProvider', 'CPUExecutionProvider'] enabled. The next release (ORT 1.10) will require explicitly setting the providers parameter (as opposed to the current behavior of providers getting set/registered by default based on the build flags) when instantiating InferenceSession.For example, onnxruntime.InferenceSession(..., providers=["CUDAExecutionProvider"], ...)
  warnings.warn("Deprecation warning. This ORT build has {} enabled. ".format(available_providers) +
        -[βœ“] ONNX model outputs' name match reference model ({'last_hidden_state', 'encoder_last_hidden_state'}
        - Validating ONNX Model output "last_hidden_state":
                -[βœ“] (2, 8, 768) matches (2, 8, 768)
                -[βœ“] all values close (atol: 0.0001)
        - Validating ONNX Model output "encoder_last_hidden_state":
                -[βœ“] (2, 8, 768) matches (2, 8, 768)
                -[βœ“] all values close (atol: 0.0001)
All good, model saved at: /data/swook/models/huggingface/facebook/bart-base/onnx/model.onnx
  1. 1st warning

    UserWarning: 'enable_onnx_checker' is deprecated and ignored. It will be removed in the next PyTorch release. To proceed despite ONNX checker failures, catch torch.onnx.ONNXCheckerError.
    • It seems we can safely disregard this warning
  2. 2nd warning

    UserWarning: `use_external_data_format' is deprecated and ignored. Will be removed in next PyTorch release. The code will work as it is False if models are not larger than 2GB, Otherwise 
    set to False because of size limits imposed by Protocol Buffers.
    • It warns we need to set use_external_data_format to False if model is larger than 2GB
    • BART is smaller than 2GB
    • It seems we can safely disregard this warning
  3. 3rd warning

    /data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py:215: TracerWarning: Conver
    ting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be t
    reated as a constant in the future. This means that the trace might not generalize to other inputs!
    if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
  • Rest warnings are similar to 3rd one
  • I compared outputs of PyTorch and ONNX for 17 sentences
  • Mean Absolute Percentage Error (MAPE) of last hidden state: 0.03% πŸ˜‚
  • Recall that
  • However, I found the dependency conflict below:
  1. SKT-AI/KoBART requires transformers==4.3.3
  2. transformers>=4.9.0 supports exporting BART to ONNX
  • I compared the predictions of our classifier for 10 examples in two different versions of transformers:
  1. transformers==4.3.3
  2. transformers>=4.12.5
  • They are all the same
  • I.e. It seems that SKT-AI/KoBART works correctly for sequence classification in transformers>=4.12.5
  • Of course, we should slightly modify the example for NSMC
  • Refer to this commit for more details
  • An example of the command to export our KoBART for sequence classification to ONNX:
$ python -m transformers.onnx \
> --model=$PYTORCH_MODEL_DIR \
> --feature sequence-classification \
> $ONNX_MODEL_DIR
  • Got the error message below when trying to export KoBART to ONNX:
Traceback (most recent call last):
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/onnx/__main__.py", line 71, in <module>
    main()
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/onnx/__main__.py", line 52, in main
    model_kind, model_onnx_config = FeaturesManager.check_supported_model_or_raise(model, feature=args.feature)
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/onnx/features.py", line 153, in check_supported_model_or_raise
    raise ValueError(
ValueError: bart doesn't support feature sequence-classification. Supported values are: ['default']
  • transformers.onnx currently does NOT support exporting BART for sequence classification
  • I need to consider other options:
  • Got the error message below when trying to export KoBART (--feature default) to ONNX:
Using framework PyTorch: 1.10.0
Overriding 1 configuration item(s)
        - use_cache -> False
Traceback (most recent call last):
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/swook/.vscode-server/extensions/ms-python.python-2021.11.1422169775/pythonFiles/lib/python/debugpy/__main__.py", line 45, in <module>
    cli.main()
  File "/home/swook/.vscode-server/extensions/ms-python.python-2021.11.1422169775/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 444, in main
    run()
  File "/home/swook/.vscode-server/extensions/ms-python.python-2021.11.1422169775/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 285, in run_file
    runpy.run_path(target_as_str, run_name=compat.force_str("__main__"))
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/data/swook/draft/kobart/export2onnx.py", line 74, in <module>
    main()
  File "/data/swook/draft/kobart/export2onnx.py", line 65, in main
    onnx_inputs, onnx_outputs = export(tokenizer, model, onnx_config, args.opset, args.output)
  File "/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/onnx/convert.py", line 111, in export
    raise ValueError("Model and config inputs doesn't match")
ValueError: Model and config inputs doesn't match
  1. input_ids
  2. attention_mask
  • However, its tokenizer provides three inputs:
  1. input_ids
  2. attention_mask
  3. token_type_ids
  • I.e. Model and config inputs doesn't match
  • token_type_ids is used to identify two different sequences
  • Refer to #Token Type IDs in Glossary for more details about token_type_ids
  • Actually, Bart doesn’t use token_type_ids for sequence classification [details]
  • Recall that I succeeded in exporting facebook/bart-base to ONNX
  • Its tokenizer provides input_ids and attention_mask, not token_type_ids
  • How can we address this issue?
  • BartTokenizer returns appropriate inputs for BART
  • The error doesn't occur when trying to export gogamza/kobart-base-v2 (--feature default) to ONNX with BartTokenizer
  • But I got the warning below:
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'PreTrainedTokenizerFast'. 
The class this function is called from is 'BartTokenizer'.
{...
tokenizer_class: "PreTrainedTokenizerFast"
...}
  • tokenizer_class is PreTrainedTokenizerFast, not BartTokenizer 😡
  • And I got new error message below:
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'PreTrainedTokenizerFast'. 
The class this function is called from is 'BartTokenizer'.
Using framework PyTorch: 1.10.0
Overriding 1 configuration item(s)
        - use_cache -> False
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/torch/onnx/utils.py:90: UserWarning: 'enable_onnx_checker' is deprecated and ignored. It will be removed in the next PyTorch release. To proceed despite ONNX checker failures, catch torch.onnx.ONNXCheckerError.
  warnings.warn("'enable_onnx_checker' is deprecated and ignored. It will be removed in "
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/torch/onnx/utils.py:103: UserWarning: `use_external_data_format' is deprecated and ignored. Will be removed in next PyTorch release. The code will work as it is False if models are not larger than 2GB, Otherwise set to False because of size limits imposed by Protocol Buffers.
  warnings.warn("`use_external_data_format' is deprecated and ignored. Will be removed in next "
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py:215: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py:221: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attention_mask.size() != (bsz, 1, tgt_len, src_len):
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py:252: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py:879: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if input_shape[-1] > 1:
Validating ONNX model...
/data/swook/miniconda3/envs/transformers-latest/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:350: UserWarning: Deprecation warning. This ORT build has ['CUDAExecutionProvider', 'CPUExecutionProvider'] enabled. The next release (ORT 1.10) will require explicitly setting the providers parameter (as opposed to the current behavior of providers getting set/registered by default based on the build flags) when instantiating InferenceSession.For example, onnxruntime.InferenceSession(..., providers=["CUDAExecutionProvider"], ...)
  warnings.warn("Deprecation warning. This ORT build has {} enabled. ".format(available_providers) +
        -[βœ“] ONNX model outputs' name match reference model ({'encoder_last_hidden_state', 'last_hidden_state'}
        - Validating ONNX Model output "last_hidden_state":
                -[βœ“] (2, 8, 768) matches (2, 8, 768)
                -[x] values not close enough (atol: 0.0001)
  • The prediction will be not close enough after exporting to ONNX