llvm/torch-mlir

[RFC] Adding Torch to MHLO conversion

ZihengJiang opened this issue ยท 21 comments

Hello everyone, we are from the AML (Applied Machine Learning) team at Bytedance. While the TorchMLIR project provides a set of MLIR dialect for Torch, which bridges PyTorch and MLIR ecosystems, in our development, we also have observed interest in supporting the MHLO dialect as a lowering target, which is still missing in the project.
In the RFC, we propose to add conversation from Torch dialect to MHLO (Meta HLO Dialect). The MHLO dialect can be found here (https://github.com/tensorflow/mlir-hlo) and the corresponding operation definition is here (https://www.tensorflow.org/mlir/hlo_ops). We would add this conversion as an independent pass and this could provide an extra path where torch mlir can also be incorporated into MHLO dialect, and thus enrich the applications of TorchMLIR.

Motivation:

The HLO (High Level Optimizer) IR offers a carefully fixed selected list of operations. MHLO supports a HLO-like compilation pipeline using MLIR and provides a uniform interface to compile and execute these optimized HLO programs with dynamic shape support.
While the MHLO dialect supports Tensorflow models by nature, if we have the conversion pass from Torch dialect to MHLO, we could convert and lower those different frontend models into a unified IR. This would reduce engineering effort and ease the subsequent backend work.

Proposal:

Benefits:

  • Bridging Torch dialect with MHLO dialect.
  • Enabling user to convert and lower those different frontend models (PyTorch, TensorFlow, etc.) into the unified MHLO IR. This would reduce engineering effort and ease the subsequent backend works.
  • Provide more accessibility of Torch to other MLIR based projects with this extra path from Torch to MHLO

Status:

We have already implemented the whole lowering pipeline and operators conversion to covert BERT and ResNet inference. See the POC in the PR: #1025 . Welcome everyone interested being part of this effort as well.

Looking forward to any feedback!

cc @silvasean @powderluv @byronyi @Vremold

Update (07/13/2022):

According to the offline meeting between Bytedance and Alibaba. We decide to break the proposal into several PRs:

  • Integrate MHLO into TorchMLIR Repo
  • Basic conversion pipeline
  • Operator conversions
  • ResNet and BERT examples
  • End-to-end unit tests

I also learned that @fortianyou is working on the similar thing. Discussion and collaboration are more than welcome.

Thanks. I'm glad to see that multiple parties are interested in this. It sounds like there might be some duplicated effort, so let's try to land this soon so we can collaborate. All the code should be shared, and we will need to reconcile the lowering patterns between Bytedance and Alibaba.

For reference, Alibaba's code is here: https://github.com/alibaba/BladeDISC/blob/main/pytorch_blade/src/torch-mlir/lib/Conversion/TorchToMhlo/TorchToMhlo.cpp

nit: let's call it TorchToMhlo for consistency (instead of TorchToMHLO)

Thank you for this effort.

I also learned that @fortianyou is working on the similar thing. Discussion and collaboration are more than welcome.

That's exciting! @ZihengJiang @byronyi

Since this proposal is suggesting MHLO as a unified IR, the ownership of MHLO dialect should be considered as well. Is it going to be MLIR-community owned?

bhack commented

Since this proposal is suggesting MHLO as a unified IR, the ownership of MHLO dialect should be considered as well. Is it going to be MLIR-community owned?

There was an interesting related thread at:
https://discourse.llvm.org/t/rfc-bring-mhlo-and-lhlo-dialects-into-mlir-upstream/3186

I've invited the lead of the MHLO efforts to the meeting, who can speak about governance. In general, our bar for this can be different than upstream MLIR.

It's nice that @silvasean had created the meeting invitation here: https://discourse.llvm.org/t/asia-friendly-developer-hour-mhlo-support/63625. FYI

Since this proposal is suggesting MHLO as a unified IR, the ownership of MHLO dialect should be considered as well. Is it going to be MLIR-community owned?

There was an interesting related thread at: https://discourse.llvm.org/t/rfc-bring-mhlo-and-lhlo-dialects-into-mlir-upstream/3186

I had noticed that the main concern on upstreaming of mlir/hlo is its relation to XLA/HLO and TPU in the community. Indeed BladeDISC didn't run into trouble on this in practice. And we benefit from mlir/hlo a lot.

Also, I agree with @silvasean , that torch-mlir has a different bar than upstream MLIR. Community folks might be interested in building a new backend based on torch-mlir and mlir/hlo. I think this is in line with the vision of the torch-mlir community.

There are various plans at Google to support these cases better, and Eugene can speak to some of that at tomorrow's meeting -- and get feedback from you on what would be useful.

When this came up in the past, the issue really hinged on development model disconnects and a lack of staffing to better support it in a broader construct -- no one was really ready to support it outside of the limited cases it was serving. There have been changes on both of those fronts, so hopefully this next conversation can be more productive than the previous ones.

(Disclosure: Sean and I work at Google but on a different team than MHLO -- we both have visibility into the plans and are trying to get the folks who own this part engaged here to work on it together. Hopefully that is what tomorrow's meeting will accomplish. This proposal kind of came unexpectedly, and we weren't quite ready to talk about everything yet but will do our best to be helpful)

Thanks for all the discussion. Looking forward to today's meeting.

FYI, here is the POC(#1025) for this RFC, in which we have implemented BERT and ResNet conversion from Torch to MHLO. We are going to refine it in next days.

cc @fortianyou @silvasean @powderluv @stellaraccident @navahgar @bhack

Thanks @ZihengJiang for posting the PR! One thing I noticed was that Alibaba has dynamic shapes as a requirement, and some of the patterns that I see in the PR you linked are tied to static shapes. So we will want to use the Alibaba patterns when they are available. For example:

https://github.com/alibaba/BladeDISC/blob/9043010a4e33f2b809a6db9933398877e39a5660/pytorch_blade/src/torch-mlir/lib/Conversion/TorchToMhlo/MhloLegalizeUtils.cpp#L220

Hi @silvasean

Thanks for the suggestion! Dynamic shape support is a good point. We are happy to update the PR to support this or collaborate with @fortianyou to distribute those pattern tasks, if Alibaba plans to contribute their code. We can discuss those options in the meeting.

bhack commented

Any meeting log or summary on this topic?

According to the offline meeting between Bytedance and Alibaba. We decide to break the proposal into several PRs:

  • Integrate MHLO into TorchMLIR Repo
  • Basic conversion pipeline
  • Operator conversions
  • ResNet and BERT examples

Looking forward to working together and landing this. cc @fortianyou @silvasean

I would like the initial PR to add support for the e2e test suite too, even if only one test (like tanh) passes. That way, we can see a minimal working system in a single PR to make it easier to review and make sure that all the pieces are present.

@ZihengJiang I noticed in the community meeting slides today, it did not mention adding MHLO to our e2e test suite.

I think we have enough working now to add that -- it would be good if that was in the "add in the next month" list.

With the ResNet and BERT examples (from Torch to MHLO) has been merged into the main branch, we could say that we have completed the promise we made in the RFC. Thanks to everyone for making this happen! @fortianyou @Vremold @Yancey1989 @silvasean @powderluv

Amazing work folks!! I can't believe you did all this in < 2 months!

@ZihengJiang Congrats!!