Optimized distribution of packet for inference only

Question

Optimized distribution of packet for inference only

dantetemplar opened this issue 6 months ago · 3 comments

I think it would be great to be able to quickly and easily add kraken to a project. But this is hampered by the installation of large dependency packages: torch, cuda stuff and others. Perhaps using ONNX would help a lot.

Answer 1 · 2024-06-17T10:28:15.000Z

ONNX doesn't play well with variable shape tensors as are needed for our text recognition models. Nor do any of the other graph compilation/capture approaches for pytorch (torchscript, coreml, ...) so there isn't currently a technically feasible way to run lightweight deployments.

Answer 2 · 2024-06-17T17:18:07.000Z

ONNX doesn't play well with variable shape tensors as are needed for our text recognition models. Nor do any of the other graph compilation/capture approaches for pytorch (torchscript, coreml, ...) so there isn't currently a technically feasible way to run lightweight deployments.

Thanks for the answer, I understood regarding ONNX. Do you know any alternatives?

Answer 3 · 2024-06-17T20:38:22.000Z

On 24/06/17 10:18AM, dantetemplar wrote: Thanks for the answer, I understood regarding ONNX. Do you know any alternatives?

All of those deployment frameworks suffer from it to some extent as far as I know. In theory you can often construct the models manually with their primitives, e.g. in CoreML, but this is tedious and often they behave differently from native pytorch layers or are missing layer types so it is rarely a simple conversion. Compilers using introspection usually blow up with variable size tensors.