abseil/abseil-cpp

[Bug]: Parameter pack workaround fails with nvcc

Closed this issue · 17 comments

Describe the issue

Was looking into failures with abseil 20240116.1 and tensorflow today, which fail with

# Execution platform: @local_execution_config_platform//:platform
$PREFIX/include/absl/strings/str_cat.h: In function 'std::enable_if_t<(sizeof... (T) > 1), typename std::common_type<typename std::conditional<true, void, absl::lts_20240116::strings_internal::EnableIfFastCase<T, typename std::enable_if<((std::is_arithmetic<T>::value && (! std::is_same<T, char>::value)) && (! std::is_same<T, bool>::value)), void>::type> >::type ...>::type> absl::lts_20240116::StrAppend(absl::Nonnull<T*>, T ...)':
$PREFIX/include/absl/strings/str_cat.h:606:316: error: expected ';' before '...' token
  606 |   for (const SomeTrivialEmptyType& dummy2 :
      |                                           ^  

$PREFIX/include/absl/strings/str_cat.h:606:43: error: parameter packs not expanded with '...': 
  606 |   for (const SomeTrivialEmptyType& dummy2 :
      |                                           ^                         

The relevant chunk of code

for (const SomeTrivialEmptyType& dummy2 :
{(/* Comma expressions are poor man's C++17 fold expression for C++14 */
(void)(n = lengths[i]),
(void)(n < 0 ? (void)(*pos++ = '-'), (n = ~n) : 0),
(void)absl::numbers_internal::FastIntToBufferBackward(
absl::numbers_internal::UnsignedAbsoluteValue(std::move(args)),
pos += n, static_cast<uint32_t>(n)),
(void)++i, dummy1)...}) {
(void)dummy2; // Remove & migrate to fold expressions in C++17

got introduced recently (d5a2cec), and as it turns out, there already was a comment on that commit noting exactly this problem:

@georgthegreat: @derekmauro, this pattern breaks nvcc (tested with nvcc11.4 + c++14, nvcc11.4 + c++17, nvcc12.1 + c++17 modes) with the following error:

absl/strings/str_cat.h:609:316: error: expected ';' after expression
for (const SomeTrivialEmptyType &dummy2 : ({((((((void)(n = (lengths[i]))), ((void)((n < (0)) ? ((void)((*(pos++)) = '-')), (n = (~n)) : (0)))), ((void)numbers_internal::FastIntToBufferBackward(numbers_internal::UnsignedAbsoluteValue(std::move(args)), pos += n, static_cast< uint32_t>(n)))), ((void)(++i))), dummy1)...}))

Can we do anything about this? I do not quite understand the meaning of this code, so I can not truly refactor it to make it compilable.

I do understand that nvcc is quite weird when it comes to such complex c++ code, so I need a workaround of some kind.

@derekmauro: @higher-performance - Can you take a look at this?

I think it effectively boils down to a compiler shortcoming of nvcc, but it would be good to be able to use abseil with CUDA.

Steps to reproduce the problem

Build CUDA-enabled tensorflow with newest abseil.

What version of Abseil are you using?

20240116.1

What operating system and version are you using?

Linux

What compiler and version are you using?

GCC + nvcc

What build system are you using?

bazel

Additional context

No response

Thanks for following up, for some reason it seems I never received the notification for the initial comment. I'll take a look.

I seem to be unable to reproduce the compile error with a minimal example. Could you please do the following?

  1. Specify what compiler version & flags you are using

  2. Try to reproduce the compile error with a minimal example in Godbot

Thanks!

I am quite unsure that nvc++ provided by godbolt actually is an NVidia / Cuda compiler

It looks like nvc++ is a part of HPC SDK, which is a totally diferent product
https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-ref-guide/

It looks like godbolt does not provide nvcc compiler at all.

Ah. I tried this locally with an nvcc though and still didn't see an issue. So I'm still stuck.

Interestingly, I've failed to reproduce the error my collaborator encountered as well (I've tried the godbolt example from above with both nvcc 11.2 & 12.0 and --std=c++17)

@georgthegreat, can you reproduce the issue on your end?

OK, it has been reproduced on our end now as well, but so far only in the context of building tensorflow. Full log can be found here.

Thanks for the log. Unfortunately this isn't enough for me to try to investigate it. I really need a reproduction I can run locally without significant effort -- which means (a) a compiler I can access easily (i.e. something on Godbolt, or available in a typical OS's package manager), and (b) a minimal reproduction (1 file) rather than an entire docker image. Having to set up a build environment and run Docker, etc. (even if I knew the command-lines for doing so, which I don't right now) is not something I can realistically do in the foreseeable future. So, if that's all we have available -- you may need to investigate this on your end.

Here's what I can help you with, though. I believe I've seen this error before. It arose from an expression like std::forward<T>(args)..., where the (buggy) compiler didn't like the fact that args and T were two seemingly unrelated parameter packs. In that case, using std::forward<decltype(args)>(args)... fixed the problem.

Unfortunately, we don't have that here... we just have std::move(args), which shouldn't have this problem, I would think, but perhaps it somehow still does for nvcc?

Since I have no way to reproduce this, I have no way to test it. But if you can reproduce this on your end, I would suggest trying out variations of this, such as using std::forward<decltype(args)>(args)... in lieu of std::move(args), and seeing if that works? You can try passing template arguments to the other functions as well, and see if that makes any difference.

Ultimately, the compiler is buggy, so perhaps the answer ought to be to please upgrade your compiler, but it's hard to even suggest that without being able to test whether newer versions of nvcc break on this.

Hi @higher-performance

A repro with godbolt here

Awesome, thanks a lot! So it looks like nvcc can't parse use of the comma operator in expansions:

template<class... T>
void StrAppend(T... args) {
  for (int _ : {((void)args, 0)...}) {
  }
}

I could probably find a workaround, but it would hurt code readability even more... let me look into it and see if I can do anything. I would suggest reporting this upstream to NVIDIA in any case.

Ah, here's a workaround. Apparently this works, I'll take care of it.

Thank you very much! :)

Thanks a lot!

Yup, thanks for the help!

@derekmauro, is it possible to backport this into 20240116 release branch?

@derekmauro, is it possible to backport this into 20240116 release branch?

If there is another patch release (and there likely will be) then I will include this change.

Thanks a lot!