Error when running spatial mapping search with Edge TPU example: The MAC level unit count is not the same for all operand
Closed this issue · 2 comments
Hello Zigzag team,
I'm running spatial mapping search on the Edge TPU example using Zigzag, and I'm getting the following error:
2023-11-08 13:38:22,270 - generate_layer_node_for_conv +208 - INFO - Parsed Conv node
2023-11-08 13:38:22,270 - generate_layer_node_for_conv +208 - INFO - Parsed Conv node
2023-11-08 13:38:22,270 - generate_layer_node_for_conv +208 - INFO - Parsed Conv node
2023-11-08 13:38:22,270 - generate_layer_node_for_conv +208 - INFO - Parsed Conv node
2023-11-08 13:38:22,271 - generate_layer_node_for_conv +208 - INFO - Parsed Conv node
2023-11-08 13:38:22,271 - generate_layer_node_for_gemm +143 - INFO - Parsed Gemm node
2023-11-08 13:38:22,271 - generate_layer_node_for_gemm +143 - INFO - Parsed Gemm node
2023-11-08 13:38:22,271 - generate_layer_node_for_gemm +143 - INFO - Parsed Gemm node
2023-11-08 13:38:22,271 - parse_workload_from_onnx_model_and_mapping +111 - INFO - Created ONNXWorkload graph with 24 nodes and 23 edges.
2023-11-08 13:38:22,272 - parse_accelerator_from_path +52 - INFO - Parsed accelerator with cores [1].
2023-11-08 13:38:22,272 - run +29 - INFO - Processing layer 0...
2023-11-08 13:38:22,272 - run +97 - INFO - User-provided spatial mappings or hints not found. Auto-generating spatial_mapping_hint..
2023-11-08 13:38:22,272 - run +132 - INFO - Launching spatial mapping 1/16: {'D1': ('K', 8), 'D2': ('FX', 8), 'D3': ('OX', 4), 'D4': ('OX', 4)}.
2023-11-08 13:38:22,272 - run +72 - INFO - Running temporal mapping search engine...
100%|████████████████████████████████████████| 720/720 [00:00<00:00, 859.45it/s]
2023-11-08 13:38:23,111 - run +132 - INFO - Launching spatial mapping 2/16: {'D1': ('K', 8), 'D2': ('FX', 8), 'D3': ('OX', 4), 'D4': ('OY', 4)}.
2023-11-08 13:38:23,112 - run +72 - INFO - Running temporal mapping search engine...
100%|████████████████████████████████████████| 720/720 [00:00<00:00, 848.91it/s]
2023-11-08 13:38:23,960 - run +132 - INFO - Launching spatial mapping 3/16: {'D1': ('K', 8), 'D2': ('FX', 8), 'D3': ('OY', 4), 'D4': ('OX', 4)}.
2023-11-08 13:38:23,960 - run +72 - INFO - Running temporal mapping search engine...
100%|████████████████████████████████████████| 720/720 [00:00<00:00, 859.84it/s]
2023-11-08 13:38:24,798 - run +132 - INFO - Launching spatial mapping 4/16: {'D1': ('K', 8), 'D2': ('FX', 8), 'D3': ('OY', 4), 'D4': ('OY', 4)}.
2023-11-08 13:38:24,798 - run +72 - INFO - Running temporal mapping search engine...
100%|████████████████████████████████████████| 720/720 [00:00<00:00, 873.07it/s]
2023-11-08 13:38:25,623 - run +132 - INFO - Launching spatial mapping 5/16: {'D1': ('K', 8), 'D2': ('FY', 8), 'D3': ('OX', 4), 'D4': ('OX', 4)}.
2023-11-08 13:38:25,623 - run +72 - INFO - Running temporal mapping search engine...
100%|████████████████████████████████████████| 720/720 [00:00<00:00, 866.16it/s]
2023-11-08 13:38:26,455 - run +132 - INFO - Launching spatial mapping 6/16: {'D1': ('K', 8), 'D2': ('FY', 8), 'D3': ('OX', 4), 'D4': ('OY', 4)}.
2023-11-08 13:38:26,455 - run +72 - INFO - Running temporal mapping search engine...
100%|████████████████████████████████████████| 720/720 [00:00<00:00, 857.06it/s]
2023-11-08 13:38:27,296 - run +132 - INFO - Launching spatial mapping 7/16: {'D1': ('K', 8), 'D2': ('FY', 8), 'D3': ('OY', 4), 'D4': ('OX', 4)}.
2023-11-08 13:38:27,296 - run +72 - INFO - Running temporal mapping search engine...
100%|████████████████████████████████████████| 720/720 [00:00<00:00, 844.54it/s]
2023-11-08 13:38:28,149 - run +132 - INFO - Launching spatial mapping 8/16: {'D1': ('K', 8), 'D2': ('FY', 8), 'D3': ('OY', 4), 'D4': ('OY', 4)}.
2023-11-08 13:38:28,149 - run +72 - INFO - Running temporal mapping search engine...
100%|████████████████████████████████████████| 720/720 [00:00<00:00, 867.52it/s]
2023-11-08 13:38:28,979 - run +132 - INFO - Launching spatial mapping 9/16: {'D1': ('OX', 8), 'D2': ('FX', 8), 'D3': ('OX', 4), 'D4': ('OX', 4)}.
2023-11-08 13:38:28,980 - run +72 - INFO - Running temporal mapping search engine...
0%| | 0/720 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/kmdl/test/zigzag/run.py", line 15, in <module>
energy, latency, cme = get_hardware_performance_zigzag(workload=workload,
File "/home/kmdl/test/zigzag/zigzag/api.py", line 73, in get_hardware_performance_zigzag
answers = mainstage.run()
File "/home/kmdl/test/zigzag/zigzag/classes/stages/Stage.py", line 49, in run
for cme, extra_info in self.list_of_callables[0](
File "/home/kmdl/test/zigzag/zigzag/classes/stages/ONNXModelParserStage.py", line 28, in run
for cme, extra_info in sub_stage.run():
File "/home/kmdl/test/zigzag/zigzag/classes/stages/MainInputParserStages.py", line 21, in run
for cme, extra_info in sub_stage.run():
File "/home/kmdl/test/zigzag/zigzag/classes/stages/SaveStage.py", line 88, in run
for id, (cme, extra_info) in enumerate(substage.run()):
File "/home/kmdl/test/zigzag/zigzag/classes/stages/SaveStage.py", line 137, in run
for id, (cme, extra_info) in enumerate(substage.run()):
File "/home/kmdl/test/zigzag/zigzag/classes/stages/ReduceStages.py", line 122, in run
for cme, extra_info in substage.run():
File "/home/kmdl/test/zigzag/zigzag/classes/stages/WorkloadStage.py", line 31, in run
for cme, extra_info in sub_stage.run():
File "/home/kmdl/test/zigzag/zigzag/classes/stages/SaveStage.py", line 32, in run
for id, (cme, extra_info) in enumerate(substage.run()):
File "/home/kmdl/test/zigzag/zigzag/classes/stages/ReduceStages.py", line 97, in run
for cme, extra_info in substage.run():
File "/home/kmdl/test/zigzag/zigzag/classes/stages/SpatialMappingGeneratorStage.py", line 145, in run
for cme, extra_info in spatial_mapping_conversion_stage.run():
File "/home/kmdl/test/zigzag/zigzag/classes/stages/SpatialMappingConversionStage.py", line 82, in run
for cme, extra_info in sub_stage.run():
File "/home/kmdl/test/zigzag/zigzag/classes/stages/ReduceStages.py", line 97, in run
for cme, extra_info in substage.run():
File "/home/kmdl/test/zigzag/zigzag/classes/stages/LomaStage.py", line 52, in run
for cme, extra_info in sub_stage.run():
File "/home/kmdl/test/zigzag/zigzag/classes/stages/CostModelStage.py", line 54, in run
self.cme = CostModelEvaluation(
File "/home/kmdl/test/zigzag/zigzag/classes/cost_model/cost_model.py", line 261, in __init__
self.mapping_int = Mapping(
File "/home/kmdl/test/zigzag/zigzag/classes/mapping/combined_mapping.py", line 212, in __init__
self.spatial_mapping = SpatialMapping(spatial_mapping, layer_node)
File "/home/kmdl/test/zigzag/zigzag/classes/mapping/spatial/spatial_mapping.py", line 30, in __init__
self.calc_unit_count()
File "/home/kmdl/test/zigzag/zigzag/classes/mapping/spatial/spatial_mapping.py", line 122, in calc_unit_count
assert all(
AssertionError: The MAC level unit count is not the same for all operand [690, 720, 690], please correct the spatial mapping.
Here is the code to reproduce this run:
from zigzag.api import get_hardware_performance_zigzag
opt = 'EDP'
model = "alexnet"
onnx_model_path = f"zigzag/inputs/examples/workload/{model}.onnx"
workload = onnx_model_path
hwarch = "Edge_TPU_like"
mapping = f"zigzag.inputs.examples.mapping.default"
accelerator = f"zigzag.inputs.examples.hardware.{hwarch}"
dump_filename_pattern=f"outputs/{hwarch}-{model}-layer_?.json"
pickle_filename = f"outputs/{hwarch}-{model}-saved_list_of_cmes.pickle"
energy, latency, cme = get_hardware_performance_zigzag(workload=workload,
accelerator=accelerator,
mapping=mapping,
opt=opt,
dump_filename_pattern=dump_filename_pattern,
pickle_filename=pickle_filename)
print(f"Total network energy = {energy:.2e} pJ")
print(f"Total network latency = {latency:.2e} cycles")
print(f"Total edp = {energy*latency:.2e} pJ*cycles")
and the mapping file I used is simply:
mapping = {
"default": {
"core_allocation": 1,
"memory_operand_links": {"O": "O", "W": "I2", "I": "I1"},
},
"Add": {
"core_allocation": 1,
"memory_operand_links": {"O": "O", "X": "I2", "Y": "I1"},
},
"Pooling": {
"core_allocation": 1,
"memory_operand_links": {"O": "O", "W": "I2", "I": "I1"},
},
}
This seems to happen for all examples with 4 levels of MACs (Edge TPU, Tesla NPU, Meta and Ascend) running Alexnet/Resnet18/MBNetv2, but not for the TPU example with 2 levels. Also FWIW, this happened after I pulled from the Zigzag repo yesterday. I don't get this error with a local copy of Zigzag timestamped Sep 25th.
Am I doing something wrong? Any help is appreciated. Thanks!
Siyuan
Hi Siyuan,
Thank you for bringing up this issue.
We've identified that the problem arises when a layer dimension is mapped to multiple hardware dimensions. I'm pleased to inform you that we have addressed and resolved this issue in the latest update.
In terms of the mapping file, I think in the ZigZag version released in September, the spatial_mapping
dictionary must be provided (you can refer to inputs/mapping/default.py for an example). However, in the current version, this requirement has been relaxed, and the spatial mapping will be automatically generated if omitted. In such cases, it is advisable to provide the spatial_mapping_hint
dictionary instead (you can refer to tests/main/test_with_mix_spatial_mapping/test_tesla_npu_like.py for an example). Omitting both dictionaries will result in a fully flexible spatial mapping searching space, which may not accurately represent a real hardware system.
Please rerun your script with the latest ZigZag version and let us know if you encounter any further issues.
Best regards,
Jiacong
Hi Jiacong,
Thanks for your reply and for taking a look at the issue. I can verify that the bug has been fixed on my end too. I'm closing the issue.
Siyuan