vsg-dev/vsgExamples

Android Example issues

geefr opened this issue · 10 comments

geefr commented

To track outstanding work for the android example

  • main.cpp needs updates for vsg 1.0, currently uses older classes, doesn't make use of Viewer
  • Should probably package some models into the app's resources, rather than loading them from device Downloads folder
  • Android Window initialisation currently broken (in vsgExamples project, and any new ones I've tried)

The window initialisation is a little odd - The native window handle is passed down to vsg okay, but hits a bad any_cast when read.

Code is definitely correct so I think it's an issue with mismatched C++ STLs, seen it a few times on android, but in this case main.cpp & vsg are linked statically, so should be sharing a single STL variant between them.

geefr commented

Okay it's not a mismatched STL, but it's a similar awkward linking / symbol duplication issue, here's the best summary I can give, not sure I understand all of it myself.

Notably:

  • vsg requires RTTI for core/type_name.h
  • vsg requires RTTI to work across the library boundary, for WindowTraits::nativeWindow, and the accompanying any_cast in Android_Window.cpp
  • Relevant Android NDK bug: android/ndk#533 (comment)

The behaviour is that the example's main.cpp and Android_Window.cpp end up with different entries for the typeid of ANativeWindow. This causes the bad_cast exception when reading the window handle in vsg.

Interestingly despite the LLVM/libc++ bug reports saying they now use a string comparison, the typeid comparison I'm getting is based on pointers -> If for some reason there's 2 typeid entries, that cast will fail, and we can't initialise the window.

Adding an extra any_cast in main.cpp before passing into vsg avoids this, but perhaps only in some architectures / situations, I think as it changes which typeid entry appears first in the table.

It may be an idea to consider a vsg::any implementation, or encapsulating the native window handles in a vsg::WindowHandle/WindowHandleAndroidNative class hierarchy; Something that would remove the need for rtti to be working across the library boundary.

I'm afraid Android has always been a little strange in this area, so may be the simpler approach. In theory the latest version of the NDK should have fixed this however, or have some combination of flags we can set to get the desired behaviour.

I'm not sure why this didn't appear earlier, but the initial version of the example was using quite an old NDK / android toolchain version.

I'll have to come back to this, I'll work around it with some local hacks to vsg for my immediate needs.

geefr commented

Thanks, sounds like the simplest option for model loading.

Another very strange crash I'm seeing - A segfault within getQueue, down inside vulkan.

    uint32_t transferQueueFamily = window->getOrCreatePhysicalDevice()->getQueueFamily(VK_QUEUE_TRANSFER_BIT);
    auto q = window->getOrCreateDevice()->getQueue(transferQueueFamily);

image

Haven't tried many things yet but I suspect it's either the same issue as the window init or something android-version-specific (since we noted the presence of headers that don't match the vulkan runtime that's actually available sometimes)

geefr commented

Some further debugging on the ANativeWindow/std::any casting issues. Most relevant issue link seems to be Samsung/ONE#4157

Made some minimal test examples, and while std::any and std::any_cast are functional in most cases, for the ANativeWindow* we need it's not. I believe this is because the type we see is only ever an opaque pointer to a forward-declared struct ANativeWindow, along with some accessor functions. The linking for the vsg app is never going to see the real definition, so the requirement for it to have a key function (to satisfy typeid(foo) == typeid(foo) across compilation units) cannot be met.

I think also this didn't used to be an issue specifically, or it worked by chance - Older versions of Android may not have used RTLD_LOCAL when loading the library, or used a different linker etc (There's always rapid change here, and Android native tools can be picky as a result).

@robertosfield Locally I've tried with a new Android_WindowTraits class to store the window handle, and I think ignoring questions of backwards compatibility that's the way to go. Instead of messing with std::any we init traits as you'd expect. In Android_Window we can keep a fallback to the std::any handle on WindowTraits, but it sounds like creating platform-specific classes is the way to go if Android is unable to support typeinfo/casting properly.

auto traits = vsgAndroid::Android_WindowTraits::create( awkwardNativeWindowPointer );
// other WindowTraits setup, as usual
auto window = vsgAndroid::Android_Window::create(traits);

2nd issue of segfaults when getting the transfer queue persists - I've had various vulkan samples running on my phone though, so I think that's a separate issue from any typeinfo / linker shenanigans.

geefr commented

Ah, I forgot vsg had that, and yes that can work with no API rework to deal with.

Thanks, no real guess on timeline but I think this can be resolved, vulkan itself definitely works on Android.

geefr commented

Okay I think I've worked it out, vsg proven working on a phone
20221124170332_IMG_7177

The 2nd crash was due to how my phone reports vulkan queues (Adreno 610) - It has a single queue for GRAPHICS | COMPUTE, but it doesn't specifically mention TRANSFER in the families. See http://vulkan.gpuinfo.org/displayreport.php?id=17515

So in Viewer::assignRecordAndSubmitTaskAndPresentation we get (uint32_t)-1, which is then passed to later vulkan functions.

uint32_t transferQueueFamily = device->getPhysicalDevice()->getQueueFamily(VK_QUEUE_TRANSFER_BIT);

Changing to this fixes the problem. My understanding is that any graphics queue must also support transfer, even if it's not reported, so maybe the fix should be in getQueueFamily instead?

Either way, the basic vsg display seems functional on android, I'll raise a vsg PR once I've tidied a few other things up.

auto transferQueueFamily = device->getPhysicalDevice()->getQueueFamily(VK_QUEUE_TRANSFER_BIT);
if( transferQueueFamily == -1 )
{
    transferQueueFamily = device->getPhysicalDevice()->getQueueFamily(VK_QUEUE_GRAPHICS_BIT);
}

Thanks great news.

Its end of day here, so tomorrow I'll look into generalizing the getQueueFamily() so it falls back to the VK_QUEUE_GRAPHICS_BIT if a VK_QUEUE_TRANSFER_BIT is available.

I have just checked the PhysicalDevice::getQueueFamily() fallback I mentioned above:

vsg-dev/VulkanSceneGraph@00d6bf0

I'll now review the PR's you've just posted.

I have merged all the changes with vsgExamples master + VSG master, and then merged with the respective 1.0 branchesm and have tagged 1.0.1-rc1's respectively, So... I think it's safe to close this Issue :-)