vae_encoder_gpu-dml_footprints.json file not found when converting stable diffusion xl base model

Question

vae_encoder_gpu-dml_footprints.json file not found when converting stable diffusion xl base model

AshD opened this issue 7 months ago · 6 comments

Describe the bug
python stable_diffusion_xl.py --model_id=stabilityai/stable-diffusion-xl-base-1.0 --optimize
/home/ash/ai/lib/python3.12/site-packages/diffusers/models/transformers/transformer_2d.py:34: FutureWarning: Transformer2DModelOutput is deprecated and will be removed in version 1.0.0. Importing Transformer2DModelOutput from diffusers.models.transformer_2d is deprecated and this will be removed in a future version. Please use from diffusers.models.modeling_outputs import Transformer2DModelOutput, instead.
deprecate("Transformer2DModelOutput", "1.0.0", deprecation_message)
Download stable diffusion PyTorch pipeline...
Loading pipeline components...: 100%|█████████████████████████████████████████████████████| 7/7 [00:00<00:00, 9.47it/s]

Optimizing vae_encoder
[2024-06-18 20:35:41,419] [INFO] [run.py:138:run_engine] Running workflow default_workflow
[2024-06-18 20:35:41,422] [INFO] [engine.py:986:save_olive_config] Saved Olive config to cache/default_workflow/olive_config.json
[2024-06-18 20:35:41,425] [WARNING] [accelerator_creator.py:182:_check_execution_providers] The following execution providers are not supported: 'DmlExecutionProvider' by the device: 'gpu' and will be ignored. Please consider installing an onnxruntime build that contains the relevant execution providers.
[2024-06-18 20:35:41,425] [INFO] [accelerator_creator.py:224:create_accelerators] Running workflow on accelerator specs: gpu-cpu
[2024-06-18 20:35:41,425] [INFO] [engine.py:109:initialize] Using cache directory: cache/default_workflow
[2024-06-18 20:35:41,425] [INFO] [engine.py:265:run] Running Olive on accelerator: gpu-cpu
[2024-06-18 20:35:41,425] [INFO] [engine.py:1085:_create_system] Creating target system ...
[2024-06-18 20:35:41,425] [INFO] [engine.py:1088:_create_system] Target system created in 0.000057 seconds
[2024-06-18 20:35:41,425] [INFO] [engine.py:1097:_create_system] Creating host system ...
[2024-06-18 20:35:41,425] [INFO] [engine.py:1100:_create_system] Host system created in 0.000053 seconds
[2024-06-18 20:35:41,453] [INFO] [engine.py:867:_run_pass] Running pass convert:OnnxConversion
[2024-06-18 20:35:41,453] [INFO] [engine.py:901:_run_pass] Loaded model from cache: 3_OnnxConversion-45ce4523-e3495161 from cache/default_workflow/runs
[2024-06-18 20:35:41,453] [INFO] [engine.py:867:_run_pass] Running pass optimize:OrtTransformersOptimization
[2024-06-18 20:35:41,454] [INFO] [transformer_optimization.py:169:validate_search_point] CPUExecutionProvider does not support float16 very well, please avoid to use float16.
[2024-06-18 20:35:41,454] [WARNING] [engine.py:873:_run_pass] Invalid search point, prune
[2024-06-18 20:35:41,454] [WARNING] [engine.py:850:_run_passes] Skipping evaluation as model was pruned
[2024-06-18 20:35:41,454] [WARNING] [engine.py:437:run_no_search] Flow ['convert', 'optimize'] is pruned due to failed or invalid config for pass 'optimize'
[2024-06-18 20:35:41,454] [INFO] [engine.py:364:run_accelerator] Save footprint to footprints/vae_encoder_gpu-cpu_footprints.json.
[2024-06-18 20:35:41,454] [INFO] [engine.py:282:run] Run history for gpu-cpu:
[2024-06-18 20:35:41,457] [INFO] [engine.py:570:dump_run_history] run history:
+------------------------------------+-------------------+----------------+----------------+-----------+
| model_id | parent_model_id | from_pass | duration_sec | metrics |
+====================================+===================+================+================+===========+
| 45ce4523 | | | | |
+------------------------------------+-------------------+----------------+----------------+-----------+
| 3_OnnxConversion-45ce4523-e3495161 | 45ce4523 | OnnxConversion | 6.64365 | |
+------------------------------------+-------------------+----------------+----------------+-----------+
[2024-06-18 20:35:41,457] [INFO] [engine.py:297:run] No packaging config provided, skip packaging artifacts
Traceback (most recent call last):
File "/home/ash/ai/Olive/examples/directml/stable_diffusion_xl/stable_diffusion_xl.py", line 635, in
main()
File "/home/ash/ai/Olive/examples/directml/stable_diffusion_xl/stable_diffusion_xl.py", line 601, in main
optimize(
File "/home/ash/ai/Olive/examples/directml/stable_diffusion_xl/stable_diffusion_xl.py", line 374, in optimize
with footprints_file_path.open("r") as footprint_file:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/pathlib.py", line 1015, in open
return io.open(self, mode, buffering, encoding, errors, newline)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/home/ash/ai/Olive/examples/directml/stable_diffusion_xl/footprints/vae_encoder_gpu-dml_footprints.json'

To Reproduce
Run python stable_diffusion_xl.py --model_id=stabilityai/stable-diffusion-xl-base-1.0 --optimize

Other information

OS: Ubuntu 22.04
olive-ai 0.6.2
onnx 1.16.1
onnxruntime 1.18.0

Answer 1 · 2024-06-27T00:27:03.000Z

hi,

from the logs it appears that the dml workflow is being skipped since you are running it in a linux environment without dml ep. Since the workflow contains an evaluator, it is checking for the presence of the dml ep and not finding it.

can you try again by removing the evaluator": "common_evaluator" part from the config json?

Answer 2 · 2024-06-28T03:50:32.000Z

Tried it.

Optimizing vae_encoder
Traceback (most recent call last):
File "/home/ash/ai/Olive/examples/directml/stable_diffusion_xl/stable_diffusion_xl.py", line 635, in
main()
File "/home/ash/ai/Olive/examples/directml/stable_diffusion_xl/stable_diffusion_xl.py", line 601, in main
optimize(
File "/home/ash/ai/Olive/examples/directml/stable_diffusion_xl/stable_diffusion_xl.py", line 369, in optimize
olive_run(olive_config)
File "/home/ash/ai/lib/python3.12/site-packages/olive/workflows/run/run.py", line 284, in run
run_config = RunConfig.parse_file_or_obj(run_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ash/ai/lib/python3.12/site-packages/olive/common/config_utils.py", line 120, in parse_file_or_obj
return cls.parse_obj(file_or_obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ash/ai/lib/python3.12/site-packages/pydantic/v1/main.py", line 526, in parse_obj
return cls(**obj)
^^^^^^^^^^
File "/home/ash/ai/lib/python3.12/site-packages/pydantic/v1/main.py", line 341, in init
raise validation_error
pydantic.v1.error_wrappers.ValidationError: 4 validation errors for RunConfig
engine
Evaluator common_evaluator not found in evaluators (type=value_error)
passes -> convert
Invalid engine (type=value_error)
passes -> optimize
Invalid engine (type=value_error)
passes -> optimize_cuda
Invalid engine (type=value_error)

Answer 3 · 2024-06-28T04:46:31.000Z

looks like you only removed it from the "evaluators" section. Sorry I was unclear. please remove the evaluator field under "engine".

Answer 4 · 2024-07-03T04:43:59.000Z

I had this same problem. i ran the pip install -r requirements.txt at the projects root but there was another requirements.txt file C:\Users\Cole\olive\Olive\examples\stable_diffusion. i re ran the command then reissued python stable_diffusion.py --optimize and that seemed to run through.

Answer 5 · 2024-07-18T09:54:40.000Z

I am also getting the same issue for "python stable_diffusion.py --optimize" as
"FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\HCKTest\Desktop\sanjeev\olive\stable_diffusion\footprints\vae_encoder_gpu-dml_footprints.json"

The above issue is due to
"pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : D:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\DmlCommandRecorder.cpp(371)\onnxruntime_pybind11_state.pyd!00007FFFF9E61070: (caller: 00007FFFF9E47F84) Exception(1) tid(1da0) 887A0006 The GPU will not respond to more commands, most likely because of an invalid command passed by the calling application."

complete trackback is

[2024-07-18 02:01:44,745] [WARNING] [engine.py:370:run_accelerator] Failed to run Olive on gpu-dml.
Traceback (most recent call last):
File "C:\Users\HCKTest\Desktop\sanjeev\python_env\stable_diffusion\lib\site-packages\olive\engine\engine.py", line 349, in run_accelerator
output_footprint = self.run_no_search(
File "C:\Users\HCKTest\Desktop\sanjeev\python_env\stable_diffusion\lib\site-packages\olive\engine\engine.py", line 441, in run_no_search
should_prune, signal, model_ids = self._run_passes(
File "C:\Users\HCKTest\Desktop\sanjeev\python_env\stable_diffusion\lib\site-packages\olive\engine\engine.py", line 856, in _run_passes
signal = self._evaluate_model(model_config, model_id, evaluator_config, accelerator_spec)
File "C:\Users\HCKTest\Desktop\sanjeev\python_env\stable_diffusion\lib\site-packages\olive\engine\engine.py", line 1078, in _evaluate_model
signal = self.target.evaluate_model(model_config, metrics, accelerator_spec)
File "C:\Users\HCKTest\Desktop\sanjeev\python_env\stable_diffusion\lib\site-packages\olive\systems\local.py", line 46, in evaluate_model
return evaluator.evaluate(model, metrics, device=device, execution_providers=execution_providers)
File "C:\Users\HCKTest\Desktop\sanjeev\python_env\stable_diffusion\lib\site-packages\olive\evaluator\olive_evaluator.py", line 193, in evaluate
metrics_res[metric.name] = self._evaluate_latency(
File "C:\Users\HCKTest\Desktop\sanjeev\python_env\stable_diffusion\lib\site-packages\olive\evaluator\olive_evaluator.py", line 118, in _evaluate_latency
latencies = self._evaluate_raw_latency(model, metric, dataloader, post_func, device, execution_providers)
File "C:\Users\HCKTest\Desktop\sanjeev\python_env\stable_diffusion\lib\site-packages\olive\evaluator\olive_evaluator.py", line 706, in _evaluate_raw_latency
return self._evaluate_onnx_latency(model, metric, dataloader, post_func, device, execution_providers)
File "C:\Users\HCKTest\Desktop\sanjeev\python_env\stable_diffusion\lib\site-packages\olive\evaluator\olive_evaluator.py", line 495, in _evaluate_onnx_latency
latencies = session.time_run(
File "C:\Users\HCKTest\Desktop\sanjeev\python_env\stable_diffusion\lib\site-packages\olive\common\ort_inference.py", line 334, in time_run
self.session.run(None, input_feed)
File "C:\Users\HCKTest\Desktop\sanjeev\python_env\stable_diffusion\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 220, in run
return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : D:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\DmlCommandRecorder.cpp(371)\onnxruntime_pybind11_state.pyd!00007FFFF9E61070: (caller: 00007FFFF9E47F84) Exception(1) tid(1da0) 887A0006 The GPU will not respond to more commands, most likely because of an invalid command passed by the calling application.

[2024-07-18 02:01:44,881] [INFO] [engine.py:290:run] Run history for gpu-dml:
[2024-07-18 02:01:44,896] [INFO] [engine.py:589:dump_run_history] run history:
+--------------------------------------------------+------------------------------------+-----------------------------+----------------+-----------+
| model_id | parent_model_id | from_pass | duration_sec | metrics |
+==================================================+====================================+=============================+================+===========+
| 8ddbdd91 | | | | |
+--------------------------------------------------+------------------------------------+-----------------------------+----------------+-----------+
| 0_OnnxConversion-8ddbdd91-076cfb73 | 8ddbdd91 | OnnxConversion | 34.4892 | |
+--------------------------------------------------+------------------------------------+-----------------------------+----------------+-----------+
| 1_OrtTransformersOptimization-0-0f55df8a-gpu-dml | 0_OnnxConversion-8ddbdd91-076cfb73 | OrtTransformersOptimization | 8.00802 | |
+--------------------------------------------------+------------------------------------+-----------------------------+----------------+-----------+
[2024-07-18 02:01:44,897] [INFO] [engine.py:305:run] No packaging config provided, skip packaging artifacts
Traceback (most recent call last):
File "C:\Users\HCKTest\Desktop\sanjeev\olive\stable_diffusion\stable_diffusion.py", line 457, in
main()
File "C:\Users\HCKTest\Desktop\sanjeev\olive\stable_diffusion\stable_diffusion.py", line 389, in main
optimize(common_args.model_id, common_args.provider, unoptimized_model_dir, optimized_model_dir)
File "C:\Users\HCKTest\Desktop\sanjeev\olive\stable_diffusion\stable_diffusion.py", line 266, in optimize
save_optimized_onnx_submodel(submodel_name, provider, model_info)
File "C:\Users\HCKTest\Desktop\sanjeev\olive\stable_diffusion\sd_utils\ort.py", line 59, in save_optimized_onnx_submodel
with footprints_file_path.open("r") as footprint_file:
File "C:\Users\HCKTest\Desktop\sanjeev\python\py3_10_9\lib\pathlib.py", line 1119, in open
return self._accessor.open(self, mode, buffering, encoding, errors,
FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\HCKTest\Desktop\sanjeev\olive\stable_diffusion\footprints\vae_encoder_gpu-dml_footprints.json'

How to solve the above issue? I also tried installing both the requirements.txt.

Answer 6 · 2024-08-07T07:57:38.000Z

looks like you only removed it from the "evaluators" section. Sorry I was unclear. please remove the evaluator field under "engine".

This works for me. how about protobuf 2GB issue ?

Traceback (most recent call last):
File "G:\Olive\examples\directml\stable_diffusion_xl\stable_diffusion_xl.py", line 635, in
main()
File "G:\Olive\examples\directml\stable_diffusion_xl\stable_diffusion_xl.py", line 601, in main
optimize(
File "G:\Olive\examples\directml\stable_diffusion_xl\stable_diffusion_xl.py", line 369, in optimize
olive_run(olive_config)
File "C:\Users\wenchien\AppData\Local\anaconda3\envs\olive_sd\lib\site-packages\olive\workflows\run\run.py", line 297, in run
return run_engine(package_config, run_config, data_root)
File "C:\Users\wenchien\AppData\Local\anaconda3\envs\olive_sd\lib\site-packages\olive\workflows\run\run.py", line 261, in run_engine
engine.run(
File "C:\Users\wenchien\AppData\Local\anaconda3\envs\olive_sd\lib\site-packages\olive\engine\engine.py", line 267, in run
run_result = self.run_accelerator(
File "C:\Users\wenchien\AppData\Local\anaconda3\envs\olive_sd\lib\site-packages\olive\engine\engine.py", line 339, in run_accelerator
output_footprint = self.run_no_search(
File "C:\Users\wenchien\AppData\Local\anaconda3\envs\olive_sd\lib\site-packages\olive\engine\engine.py", line 431, in run_no_search
should_prune, signal, model_ids = self._run_passes(
File "C:\Users\wenchien\AppData\Local\anaconda3\envs\olive_sd\lib\site-packages\olive\engine\engine.py", line 829, in _run_passes
model_config, model_id = self._run_pass(
File "C:\Users\wenchien\AppData\Local\anaconda3\envs\olive_sd\lib\site-packages\olive\engine\engine.py", line 937, in _run_pass
output_model_config = host.run_pass(p, input_model_config, data_root, output_model_path, pass_search_point)
File "C:\Users\wenchien\AppData\Local\anaconda3\envs\olive_sd\lib\site-packages\olive\systems\local.py", line 32, in run_pass
output_model = the_pass.run(model, data_root, output_model_path, point)
File "C:\Users\wenchien\AppData\Local\anaconda3\envs\olive_sd\lib\site-packages\olive\passes\olive_pass.py", line 224, in run
output_model = self._run_for_config(model, data_root, config, output_model_path)
File "C:\Users\wenchien\AppData\Local\anaconda3\envs\olive_sd\lib\site-packages\olive\passes\onnx\transformer_optimization.py", line 332, in run_for_config
return model_proto_to_olive_model(optimizer.model, output_model_path, config)
File "C:\Users\wenchien\AppData\Local\anaconda3\envs\olive_sd\lib\site-packages\olive\passes\onnx\common.py", line 164, in model_proto_to_olive_model
has_external_data = model_proto_to_file(
File "C:\Users\wenchien\AppData\Local\anaconda3\envs\olive_sd\lib\site-packages\olive\passes\onnx\common.py", line 108, in model_proto_to_file
onnx.save_model(model, str(output_path))
File "C:\Users\wenchien\AppData\Local\anaconda3\envs\olive_sd\lib\site-packages\onnx_init.py", line 327, in save_model
serialized = _get_serializer(format, model_filepath).serialize_proto(proto)
File "C:\Users\wenchien\AppData\Local\anaconda3\envs\olive_sd\lib\site-packages\onnx\serialization.py", line 100, in serialize_proto
result = proto.SerializeToString()
ValueError: Message onnx.ModelProto exceeds maximum protobuf size of 2GB: 5136056262