## Quick Tutorial

use `transform.py` to transform the workloads (in relay format) in `worklaods/` directory to akg info for subsequent codegen

```
mkdir infos
python3 transform.py
```

use `auto_mindtricks.py` to generate mindtricks for fused conv/matmul operators' codegen

```
mkdir mindtricks
python3 auto_mindtricks.py
```

copy the generated `infos/` and `mindtricks/` directories to akg repository

## Dataset Split

If you want to split the workloads into trainset and testset then use `separate.py` to split infos into trainset and testset

```
mkdir trains
mkdir tests
python3 separate.py
```

## Register New workloads

If you want to test another workloads, we need the TVM repository. First, set PYTHONPATH, we need TVM repo to get the relay ir output

```
export PYTHONPATH=$TVM_HOME/python:$PYTHONPATH
```

use `to_script.py` to transform the model into relay, and apply necessary passes for inference situation, hint: you have to register new models in TVM's get_network function

```
mkdir -p workloads
python3 to_script.py -w bert_large -bs 16 -t "cuda" > workloads/bert_large_log
```

use `count.py` to count kinds of ops then you can register new ops in op.py

```
python3 count.py
```

we have some assumption:
- we will pad kernel for conv2d and matmul when their n-axis(output channel size for conv2d) or reduce-axis is not divisible by 32 for ease of codegen on tensorcore
- further, we have one another constraint at now that we only pad kernels whose n-axis is not divisible by 32 but divisbile by 8, the filtered out cases could be handled by isl schedule tree's isolate, but are not considered recently.