iree-org/iree-turbine

(Feature) Load model parameters into Launchable runtime context directly from equivalent torch.device.

Opened this issue · 0 comments

With #17, we can slot mlir compilation + vmfb execution into pytorch program execution without having to suffer device-to-host roundtrips on the input and output tensors.

This "playing nice" with pytorch device memory mgmt would be very helpful with model parameters, as well -- for applications like SD where weights are hijacked and manipulated up until model invocation, it would help to have a smooth transition to loading those into the IREE runtime context that doesn't require another round trip on the params during program execution.