LuisaGroup/LuisaRender

Windows build fails with type inference errors in aov.cpp and gpt.cpp due to return type of camera->film()->node()->exposure();

NeedsMoar opened this issue · 5 comments

Greetings, just tried to build trunk and got some error spew with lots of "No matching argument type found for exp2(exposure) in aov.cpp / gpt.cpp"

I'm targeting Windows 10 x64, Visual Studio Version 17.9.5 CUDA toolkit 12.4 rev 1 with CUDA & DX backends enabled, I think I managed to get vulkan in there too for gui this time, was having trouble with that before. Cleaned cmake cache and build tree as usual.

After digging around, it looks like both are directly traceable to the same function calls directly previous:
auto exposure = camera->film()->node()->exposure();
which returns a luisa:float3
For whatever reason (maybe intentional) this cannot be auto-inferred as the luisa:compute:dsl:Float3 (luisa:compute:dsl:Vector<Float, 3>) type exp2 wants, and fails the is_dsl_v constraint check on the luisa:compute:dsl:exp2 template, which causes a bunch of extraneous error spew from following auto-deduced types that can't decide what they are.

I was able to get it to build by explicitly specifying the type of "exposure" as luisa:compute:dsl:Float3 so I'd assume the two are compatible and there was just no implicit operator, but since one is in the DSL namespace and wasn't selected and I haven't dug into how that is determined I'm concerned that just returning that type might not be safe or I'd make a pull request myself but if it's fine to convert between those two types in that context from a DSL standpoint the fix should probably just be changing the return type on the exposure() method unless the issue is that exposure() is supposed to be DSL-wrapped code.

In my experience GCC (and Clang on linux for compatibility reasons) have always had somewhat "sloppier" type matching for auto that's more willing to hop namespaces looking for matches so I'm guessing this wouldn't show up on non-Windows builds (or might be a warning not shown since the default build isn't -Wall AFAICT).

I'm going to test the build and see if anything explodes at runtime but hopefully this is enough info for you to take a look at it in the meantime

Hi @NeedsMoar

Thanks for reporting this issue!

We just added a host-side exp2(float3) in LuisaCompute (a submodule of LuisaRender) that didn’t exist before. So maybe the type mismatch was because the submodules in your clone are not updated?

Could you try

git submodule update --recursive --init

and see if it fixes the build?

I had just updated everything before I built, it's called from within a Kernel2D though so it looks like it's not considering the host-side version, it's trying this one in compute/include/luisa/func.h instead.

Edit: So I already built with that change in place if that wasn't clear.

template<typename T>
    requires is_dsl_v<T> && is_floating_point_or_vector_expr_v<T>
[[nodiscard]] inline auto exp2(T &&x) noexcept {
    return detail::make_vector_call<float>(
        CallOp::EXP2, std::forward<T>(x));
}

I just noticed host side closure callables were added to that file 4 months ago though, which brings it back to being an issue with the return type of exposure() not being implicitly convertible to one of the compute namespaced types that satisfies the is_dsl_v and is_floating_point_or_vector_expr_v concepts, or maybe the type inference rules just not allowing it to look outside of the luisa:: namespace since assigning the luisa:float3 that was returned to a luisa:compute:dsl:Float3 instead of using auto didn't throw any warnings. I think there's an operator= assigned for the conversion but not an implicit ctor. Either this doesn't happen anywhere else or it was fixed already.

Anway I'll let you know if I find anything else. All of the integrators seemed to work fine (or as well as they're supposed to, I've never been able to get good results from pssmlt or megapm and megawave is extremely slow) until I tried to use the zsobol sampler with something and it created a call in the temp .cu file to a runtime function that didn't exist, which has been preventing luisa from generating anything else since so it just sends zero bytes and optix crashes. The directx backend still works so I'm not sure what's going on there. I'll have to try it again after I've run houdini or something else that reinitializes the driver cleanly.

Thanks for the findings! I'll try to track down the type mismatches ASAP.

Anway I'll let you know if I find anything else. All of the integrators seemed to work fine (or as well as they're supposed to, I've never been able to get good results from pssmlt or megapm and megawave is extremely slow) until I tried to use the zsobol sampler with something and it created a call in the temp .cu file to a runtime function that didn't exist, which has been preventing luisa from generating anything else since so it just sends zero bytes and optix crashes. The directx backend still works so I'm not sure what's going on there. I'll have to try it again after I've run houdini or something else that reinitializes the driver cleanly.

This seems like another issue in the CUDA backend implementation. I'll look into it too.

Hi @NeedsMoar

I believe the type mismatch issue is fixed now. For some reasons the compiler didn't select the correct overloading, so now I just explicitly specify the namespace as luisa::exp2 to ensure the overloading.

For the second integrator issue, mega-wave depends on thread-block sync functions that are unavailable with OptiX so it's expected to fail on the CUDA backend. I added an error message in the integrator to inform users.

Sorry it took a bit. I wiped out my local changes and updated and everything built with no errors. Things seem to work ok fine so I'll go ahead and close this.