lupoglaz/GodotAIGym

boost::interprocess_exception::library_error

Closed this issue · 11 comments

Hello,

I am getting a lot of boost::interprocess_exception::library_error errors with my custom environment. I've read a little bit about the boost managed_shared_memory class, but I don't quite see how to determine exactly what the problem is or how to work around it.

The error seems to get thrown at random. Sometimes I can go 100 iterations without issue, sometimes it comes up right away. I am writing a whole image to the semaphore using a 4096 int array. That does work, except sending a large array seems to make the problem more likely to occur. If instead of sending a 4096 byte array I send one byte, the problem doesn't usually come up.

I added a bit more print statements and found that the error happens either when sending the action data or reading the observation data. The program will print these exception messages and then either segfault or hang.

I wonder if there is some issue where both processes try to access the object in memory at the same time? Might be less likely with the examples which all send state as a few bytes.

I will share my code for training a 3D vehicle using Google Dreamer V2 once I get it working. If anyone has any insight it would be appreciated. Thank you!

I'll have to modify the code to print errors in the *.err file. Assigned.

Great! I have been making progress debugging this. I will upload my current code tonight so you can see it. I found a few things that improved the situation, but I do still have crashes preventing me from training.

Question: can we see debug output in the godot plugin C++ code? As far as I can tell print statements there do not show up anywhere.

Here's my debug code which I've added on branch "debugging"
tlalexander@a56f0a0

Please let me know what you think! My DreamerV2 code seems to work based on as much of it as I have been able to run so far. My 3D environment will be open source once this is working so I'd be happy to contribute it as an example for this project.

Here is a backtrace from a typical crash. Thanks again!

[1] /lib/x86_64-linux-gnu/libc.so.6(+0x46210) [0x7fa71c436210] (??:0)
[2] /home/taylor/Software/GodotAIGym/Tutorials/RoverExample/RoverAI/RoverAI.x86_64() [0x8069fd] (/usr/include/boost/interprocess/offset_ptr.hpp:343)
[3] /home/taylor/Software/GodotAIGym/Tutorials/RoverExample/RoverAI/RoverAI.x86_64() [0x80ce45] (/home/taylor/Software/godot/./core/method_bind.gen.inc:961 (discriminator 9))
[4] /home/taylor/Software/GodotAIGym/Tutorials/RoverExample/RoverAI/RoverAI.x86_64() [0x1f8c5f7] (/home/taylor/Software/godot/core/object.cpp:921 (discriminator 1))
[5] /home/taylor/Software/GodotAIGym/Tutorials/RoverExample/RoverAI/RoverAI.x86_64() [0x20326f9] (/home/taylor/Software/godot/core/variant_call.cpp:1112 (discriminator 1))
[6] /home/taylor/Software/GodotAIGym/Tutorials/RoverExample/RoverAI/RoverAI.x86_64() [0x91bc82] (/home/taylor/Software/godot/modules/gdscript/gdscript_function.cpp:1091)
[7] /home/taylor/Software/GodotAIGym/Tutorials/RoverExample/RoverAI/RoverAI.x86_64() [0x8d0710] (/home/taylor/Software/godot/./core/variant.h:418)
[8] /home/taylor/Software/GodotAIGym/Tutorials/RoverExample/RoverAI/RoverAI.x86_64() [0xf34e6f] (/home/taylor/Software/godot/./core/variant.h:418)
[9] /home/taylor/Software/GodotAIGym/Tutorials/RoverExample/RoverAI/RoverAI.x86_64() [0x138a97a] (/home/taylor/Software/godot/scene/3d/spatial.h:54)
[10] /home/taylor/Software/GodotAIGym/Tutorials/RoverExample/RoverAI/RoverAI.x86_64() [0x1f86628] (/home/taylor/Software/godot/core/object.cpp:933)
[11] /home/taylor/Software/GodotAIGym/Tutorials/RoverExample/RoverAI/RoverAI.x86_64() [0xf58614] (/home/taylor/Software/godot/scene/main/scene_tree.cpp:985)
[12] /home/taylor/Software/GodotAIGym/Tutorials/RoverExample/RoverAI/RoverAI.x86_64() [0xf66a92] (/home/taylor/Software/godot/scene/main/scene_tree.cpp:481 (discriminator 2))
[13] /home/taylor/Software/GodotAIGym/Tutorials/RoverExample/RoverAI/RoverAI.x86_64() [0x645024] (/home/taylor/Software/godot/main/main.cpp:2005)
[14] /home/taylor/Software/GodotAIGym/Tutorials/RoverExample/RoverAI/RoverAI.x86_64() [0x62af59] (/home/taylor/Software/godot/platform/x11/os_x11.cpp:3257)
[15] /home/taylor/Software/GodotAIGym/Tutorials/RoverExample/RoverAI/RoverAI.x86_64(main+0xf1) [0x619251] (/home/taylor/Software/godot/platform/x11/godot_x11.cpp:56)
[16] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fa71c4170b3] (??:0)
[17] /home/taylor/Software/GodotAIGym/Tutorials/RoverExample/RoverAI/RoverAI.x86_64() [0x61b54e] (??:?)
-- END OF BACKTRACE --

Great! I have been making progress debugging this. I will upload my current code tonight so you can see it. I found a few things that improved the situation, but I do still have crashes preventing me from training.

Question: can we see debug output in the godot plugin C++ code? As far as I can tell print statements there do not show up anywhere.

If your gym class launches the environment in this way:

with open("stdout.txt","wb") as out, open("stderr.txt","wb") as err:
    if render:
        self.process = subprocess.Popen([exec_path, "--path", os.path.abspath(env_path), "--handle", self.handle], stdout=out, stderr=err)
    else:
        self.process = subprocess.Popen([exec_path, "--path", os.path.abspath(env_path),"--disable-render-loop", "--handle", self.handle], stdout=out, stderr=err)

then, when you launch training you'll find two files in the same directory stderr.txt and stdout.txt. They contain outputs of the godot console and the error stream.

any progress here?

I have not made progress, though this continues to be an important thing for me to solve. I’ve just set it aside for a time.

@lupoglaz I do launch with the output sent to a text file. I believe there are some places in the code where print statements do not show up in those text files. But I have wanted to double check that.

@schwartazi can you share what you are working on and how you have run in to the issue? It may help us solve this. Thanks!

@tlalexander I will try to share a reference code.
I see that you found a way to bypass this in DreamerV2 can you explain how?
I don't understand what is the difference between the Inverse pendelum that is working and other projects

This problem seems to go away with the current implementation. Plz check the current master branch.

Great thanks! I will test this. New OS so I have to set everything up. I will report back one way or the other when I get it all going.

Thank you, btw I am making more in-depth tutorial on how to do everything from scratch: https://youtu.be/qolRDx1Q_TQ
there will be 6 part in total, 4 are done and I hope I'll finish the last 2 by the next Sunday. Hope they'll help.