Add FPGA support to SonicTriton
Opened this issue · 11 comments
The FaaST FPGA server uses Triton calls in order to be interoperable with the existing SonicTriton client. An explicit conversion from floating point to fixed point may be needed (as opposed to the direct reinterpret_cast
currently used to handle data for the Triton GPU server).
Assigned to: @drankincms
Summary of latest discussion: try to revamp the FaaST server to do the floating point -> fixed point conversion on the server side. FaaST should also be updated to send dimension and data type information*, where the data type is what the client should provide (e.g. FP32) rather than the fixed point type needed by the FPGA.
If/when this works, a Docker or Singularity image for the FaaST server can be provided as another example in CMSSW. (A CPU emulation of the FPGA could allow people without FPGA access to run the FaaST example, but this may be slow and clunky.)
* Triton provides these types: https://github.com/triton-inference-server/server/blob/010334ac4b1aa35e7ca4f19680b3436d203284f1/src/core/model_config.cc#L39-L71
Updated plan: ability to do conversion on server is nice as an extra, but can reduce throughput (especially w/ multiple clients pinging one server).
We may consider adding a "conversion factory" to allow runtime decisions about how to convert floating point to fixed point, and easy addition of new kinds of conversions. An example of the kinds of conversions required is here: https://github.com/hls-fpga-machine-learning/SonicCMS/blob/97ae4c4fa5a791d501f76e1bce7072cca8d8a791/TensorRT/plugins/HcalProducerFPGA.cc#L37-L42
An example of CMSSW factories:
- base class: https://github.com/cms-sw/cmssw/blob/ee2cdd586b1007a3e6059a7392108848f3953a71/RecoParticleFlow/PFProducer/interface/BlockElementLinkerBase.h
- factory: https://github.com/cms-sw/cmssw/blob/ee2cdd586b1007a3e6059a7392108848f3953a71/RecoParticleFlow/PFProducer/src/pluginFactories.cc
- derived class: https://github.com/cms-sw/cmssw/blob/ec4c535880057c2e4d4d8b603f872bc9ea9eba43/RecoParticleFlow/PFProducer/plugins/linkers/ECALAndBREMLinker.cc
- underlying code and macros: https://github.com/cms-sw/cmssw/blob/master/FWCore/PluginManager/interface/PluginFactory.h
(The best location and usage of the proposed conversion factory is to be determined.)
More notes:
- FACILE producer now tested with server-side floating point to fixed point conversion
- need to re-test w/ Triton v2 API
- next: try to put conversion factory in https://github.com/cms-sw/cmssw/blob/8ca7de64fccbe0da0572424fac930e179a32f3b1/HeterogeneousCore/SonicTriton/src/TritonData.cc#L107
I have been trying to construct a conversion factory but Im not quite sure how to handle the conversion generically enough. My understanding is that the factory base should have a virtual function we call inside toServer
around here: https://github.com/cms-sw/cmssw/blob/8ca7de64fccbe0da0572424fac930e179a32f3b1/HeterogeneousCore/SonicTriton/src/TritonData.cc#L125 (and also in fromServer
potentially). Then there would be a dummy converter class that inherits from the base which basically doesn't do anything, to match things currently, and then a converter class specifically for certain types like ap_fixed
. However, I can't figure out how to make the convert function work generically for the current setup without using templates like we use already, which don't work with virtual functions as I understand it. Is there some std::any
type magic that I am missing? Or perhaps I don't understand the flexibility of the factory.
With the factory approach, for this particular problem, the goal is to have a conversion class where the specific ap_fixed
type only exists inside the class. The signatures of all public-facing class methods must be identical (corresponding to the base class interface). Therefore, the conversion function should ultimately return uint8_t*
. In this setup, the default conversion operation would just be reinterpret_cast
, as we use for GPU.
Ok, to document things a bit: I have a first pass at this here: https://github.com/drankincms/cmssw/tree/triton_converter_v1
(its built off Jeff's Facile work so a useful diff is here: https://github.com/jeffkrupa/cmssw/compare/hcalreco-facile...drankincms:triton_converter_v1?expand=1)
Right now it contains only the basic converters for float
and int64_t
which just call reinterpret_cast
, but assuming a couple tests work then I can start adding converters for ap_fixed
types in the same way
Ok, things are now fully working on the same branch I used above (https://github.com/drankincms/cmssw/tree/triton_converter_v1). Right now this is still just the basic converters. Should I submit a PR here and then we can proceed to CMSSW after that? Or just go straight to CMSSW?
@drankincms fully working including ap_fixed
conversions?
I would propose the following:
- rebase your branch to CMSSW_11_2_0_pre8 (removing Jeff's commits)
- run
scram b code-checks
andscram b code-format
- make a PR to https://github.com/hls-fpga-machine-learning/cmssw so I can review it for code rules etc.
(I may not be able to review it right away, and will also be considering the fully templated TritonData redesign we had discussed.)
@kpedro88 Now it includes an example ap_fixed conversion, which took a bit longer than expected.
I have run scram b code-checks
and scram b code-format
, and rebased to 11_2_0_pre9, but an attempt to make a PR to https://github.com/hls-fpga-machine-learning/cmssw on master or CMSSW_11_2_X gives a lot of differences, so I assume there's a better branch with which to make a PR?
@drankincms I have to update the master branch in the fastml CMSSW fork manually before you make the PR. I've done that now, so you should be able to make the PR.
The internal review PR is: fastmachinelearning/cmssw#2