fastmachinelearning/SonicCMS

Add FPGA support to SonicTriton

Opened this issue · 11 comments

The FaaST FPGA server uses Triton calls in order to be interoperable with the existing SonicTriton client. An explicit conversion from floating point to fixed point may be needed (as opposed to the direct reinterpret_cast currently used to handle data for the Triton GPU server).

Assigned to: @drankincms

Summary of latest discussion: try to revamp the FaaST server to do the floating point -> fixed point conversion on the server side. FaaST should also be updated to send dimension and data type information*, where the data type is what the client should provide (e.g. FP32) rather than the fixed point type needed by the FPGA.

If/when this works, a Docker or Singularity image for the FaaST server can be provided as another example in CMSSW. (A CPU emulation of the FPGA could allow people without FPGA access to run the FaaST example, but this may be slow and clunky.)

* Triton provides these types: https://github.com/triton-inference-server/server/blob/010334ac4b1aa35e7ca4f19680b3436d203284f1/src/core/model_config.cc#L39-L71

Updated plan: ability to do conversion on server is nice as an extra, but can reduce throughput (especially w/ multiple clients pinging one server).

We may consider adding a "conversion factory" to allow runtime decisions about how to convert floating point to fixed point, and easy addition of new kinds of conversions. An example of the kinds of conversions required is here: https://github.com/hls-fpga-machine-learning/SonicCMS/blob/97ae4c4fa5a791d501f76e1bce7072cca8d8a791/TensorRT/plugins/HcalProducerFPGA.cc#L37-L42

An example of CMSSW factories:

(The best location and usage of the proposed conversion factory is to be determined.)

More notes:

I have been trying to construct a conversion factory but Im not quite sure how to handle the conversion generically enough. My understanding is that the factory base should have a virtual function we call inside toServer around here: https://github.com/cms-sw/cmssw/blob/8ca7de64fccbe0da0572424fac930e179a32f3b1/HeterogeneousCore/SonicTriton/src/TritonData.cc#L125 (and also in fromServer potentially). Then there would be a dummy converter class that inherits from the base which basically doesn't do anything, to match things currently, and then a converter class specifically for certain types like ap_fixed. However, I can't figure out how to make the convert function work generically for the current setup without using templates like we use already, which don't work with virtual functions as I understand it. Is there some std::any type magic that I am missing? Or perhaps I don't understand the flexibility of the factory.

With the factory approach, for this particular problem, the goal is to have a conversion class where the specific ap_fixed type only exists inside the class. The signatures of all public-facing class methods must be identical (corresponding to the base class interface). Therefore, the conversion function should ultimately return uint8_t*. In this setup, the default conversion operation would just be reinterpret_cast, as we use for GPU.

Ok, to document things a bit: I have a first pass at this here: https://github.com/drankincms/cmssw/tree/triton_converter_v1
(its built off Jeff's Facile work so a useful diff is here: https://github.com/jeffkrupa/cmssw/compare/hcalreco-facile...drankincms:triton_converter_v1?expand=1)
Right now it contains only the basic converters for float and int64_t which just call reinterpret_cast, but assuming a couple tests work then I can start adding converters for ap_fixed types in the same way

Ok, things are now fully working on the same branch I used above (https://github.com/drankincms/cmssw/tree/triton_converter_v1). Right now this is still just the basic converters. Should I submit a PR here and then we can proceed to CMSSW after that? Or just go straight to CMSSW?

@drankincms fully working including ap_fixed conversions?

I would propose the following:

  1. rebase your branch to CMSSW_11_2_0_pre8 (removing Jeff's commits)
  2. run scram b code-checks and scram b code-format
  3. make a PR to https://github.com/hls-fpga-machine-learning/cmssw so I can review it for code rules etc.

(I may not be able to review it right away, and will also be considering the fully templated TritonData redesign we had discussed.)

@kpedro88 Now it includes an example ap_fixed conversion, which took a bit longer than expected.

I have run scram b code-checks and scram b code-format, and rebased to 11_2_0_pre9, but an attempt to make a PR to https://github.com/hls-fpga-machine-learning/cmssw on master or CMSSW_11_2_X gives a lot of differences, so I assume there's a better branch with which to make a PR?

@drankincms I have to update the master branch in the fastml CMSSW fork manually before you make the PR. I've done that now, so you should be able to make the PR.

The internal review PR is: fastmachinelearning/cmssw#2