## Quick Tutorial use `transform.py` to transform the workloads (in relay format) in `worklaods/` directory to akg info for subsequent codegen ``` mkdir infos python3 transform.py ``` use `auto_mindtricks.py` to generate mindtricks for fused conv/matmul operators' codegen ``` mkdir mindtricks python3 auto_mindtricks.py ``` copy the generated `infos/` and `mindtricks/` directories to akg repository ## Dataset Split If you want to split the workloads into trainset and testset then use `separate.py` to split infos into trainset and testset ``` mkdir trains mkdir tests python3 separate.py ``` ## Register New workloads If you want to test another workloads, we need the TVM repository. First, set PYTHONPATH, we need TVM repo to get the relay ir output ``` export PYTHONPATH=$TVM_HOME/python:$PYTHONPATH ``` use `to_script.py` to transform the model into relay, and apply necessary passes for inference situation, hint: you have to register new models in TVM's get_network function ``` mkdir -p workloads python3 to_script.py -w bert_large -bs 16 -t "cuda" > workloads/bert_large_log ``` use `count.py` to count kinds of ops then you can register new ops in op.py ``` python3 count.py ``` we have some assumption: - we will pad kernel for conv2d and matmul when their n-axis(output channel size for conv2d) or reduce-axis is not divisible by 32 for ease of codegen on tensorcore - further, we have one another constraint at now that we only pad kernels whose n-axis is not divisible by 32 but divisbile by 8, the filtered out cases could be handled by isl schedule tree's isolate, but are not considered recently.