conspack/cl-conspack

Increasing performance for special cases?

Opened this issue · 1 comments

Hello,

I am using cl-conspack via cl-mpi in a distributed-memory parallel environment, and I observe that it takes a significant time of my communication overhead. I can tailor my communications so that I transmit only one vector with special element types 'double-float and (unsigned-byte 32), however I need both of them. Would it be possible to speed up conspack encoding/decoding for those special kind of vectors? Or is this an impossible request, because it becomes too implementation-dependent then?
Thank you,
Nicolas

rpav commented

Speedups in conspack (and fast-io) are certainly possible, mostly revolving around cases like this with large homogenous vectors. The main killer right now is it still has to encode element-by-element. Fixing this would likely require a couple things.

  • Implementing endian switching, because I made the horrible mistake of using network byte order to start with. This is probably not really hard—two additional codes—and it should be mostly-transparent to use.
  • Direct dumps to and from arrays, probably using static-vectors, to speed up copying. This will probably slightly more intervention from the end user, but in cases such as yours, highly beneficial and possibly already in place.

I probably don't have the time to implement this myself, but I'd be available to point out where things should likely go should someone want to work on it. Overall it's a fair but probably not large amount of work.