uber/neuropod

Tensorflow backend in IPE mode reports warning "expects data to be 64 byte aligned" on every inference

Opened this issue · 0 comments

Bug

  • Using OPE: no
  • Neuropod backend TensorFlow, CPU

Found during benchmark that in IPE mode TF backend reports warning on every request that itself affects performance:

neuropod/backends/tensorflow/tf_tensor.cc:140: SPDLOG_WARN("In order to wrap data, TensorFlow expects data to be 64 byte aligned! Making a copy...");

We implemented allocator (in Java) that guarantees buffer alignment 64, so this isn't a blocker. But still wanted to discuss the following suggestions:

Expected behavior, Suggestions:

Problem №1

In high performant systems we should "guard" logger from being problem for system health (read/write I/O, busy and full disk) and info, warning, errors should not have a visible performance impact under high load.

Options:

  1. Logger should support "suppress" if logging the same message more than N. It probably means that Logger need to compare logging message with previous one. To mitigate it we may have a special flag or call for "suppress" and perform suppress only if "flood" is detected (ideally find some "standard" algo with options that can be used).

  2. Implement Logger with Ring Buffer, with pattern Multiple Producers (concurrent LOG calls) ->Single Consumer (dedicated thread that reads from Ring Buffer and writes to log file (or whatever) then. Depending on policies, Producers may overwrite "old" or drop own logs if buffer is full. Consumer can detect duplicates and avoid spam in log, writing as "log message ... suppressed another 10K occurrences. Vivek, do you know any OSS C++ logger like that? (searched for C++ version of LMAX disruptor" https://github.com/fsaintjacques/disruptor-- that can be used as with current Neuropod Logger). Such approach can be a + for neuropod for high load and for further development to control extra "cost".

(My experience: I used similar approach for logger in C++ Algo Trading service)

Problem №2

I investigated Tensorflow code and it seems it is related to https://github.com/tensorflow/runtime/blob/4adbd9585428dea3fed2f69e721b7369410a27d0/third_party/eigen/BUILD#L47
Looking at current TF repo code and https://eigen.tuxfamily.org/dox/TopicPreprocessorDirectives.html, I am not sure if align 64 is really necessary to avoid "copy". Vivek, can you at least audit it and if it is still the best way to guarantee "no copy", leave a comment at the warning. Ideally, if Neuropod report warning (with suppress)"send" it to TF as is, let it decide. In this case caller/user can experiment with alignments, benchmark it. Maybe have it as an "option", if user want Neuropod to do "smart" things or be thing layer instead and give user/caller more control.

I put both together for now we can split onto separate issues if find it "valid" first