ONNC/onnc

Custom shape inference

Opened this issue · 4 comments

Currently I think the input and output tensors shapes are inferred through ONNX's InferShapes, but that's based on the ONNX operators memory requirements.
However, the lowered target compute operators might have different memory requirements (input/output tensors with shapes different than what ONNX::InferShapes computed).

Is it possible to infer the memory requirements for specific target compute operators?
Is this feature currently implemented in any ONNC backend? (sorry if this question is silly, as I'm quite new to this project).

Thank you.

I am not sure if you are talking about this
https://github.com/ONNC/onnc/blob/master/lib/Target/NvDla/NvDlaMemInfoPass.cpp

@tigercosmos I'm referring to the tensor sizes that are further used for memory allocation (e.g.

memOpnd->setLength(alloc.size);
).
How are the tensor sizes computed? I assume each IR operation has a set of input tensors and a set of output tensors and a rule of computing the their sizes. So far I found out that the tensors sizes are computed within the ONNX graph (by using InferShapes).

My question is if there is already implemented a mechanism to compute the tensor sizes for my target operators. For e.g. a convolution operator might add padding between the output maps, thus the output tensor (hence memory requirements) will be larger than the original tensor shape (computed within the ONNX graph).

@fahrenheitjo The padding is just the memory constraint or it will be involved in the following computation?

If is just a memory constrain, there is class called TargetMemInfo, you can check out the X86TargetMemInfo

MemSize X86TargetMemInfo::getTensorMemorySize(const Tensor& pVal)
{
uint64_t align, size;
switch (pVal.kind()) {
case kUint8:
case kInt8:
case kBoolean:
align = 16, size = 1;
break;
case kUint16:
case kInt16:
case kFloat16:
align = 16, size = 2;
break;
case kFloat:
case kInt32:
case kUint32:
align = 16, size = 4;
break;
case kInt64:
case kUint64:
align = 16, size = 8;
break;
default:
assert(false && "Un-support value type.");
return MemSize();
}
for (auto i : pVal.getDimensions())
size *= i;
return MemSize(align, size);
}

The LinearScanMemAlloc will use the TargetMemInfo::getTensorMemorySize to ask Target Backend to get the real memory requirement.

MemSize m = m_TMI->getTensorMemorySize(*(Tensor*)v);

Is this suitable for you?

@a127a127 thank you for the response. GetTensorMemorySize is too generic and doesn't take into account what operation that tensor will be used for.
I actually need a way to specify the memory requirements for specific operations. For example I might have a 3x3 convolution operation optimized for various use cases which have slightly different memory requirements.

Another question is where/how is the output tensors sizes computed for a specific operation?