openai/procgen

Segfault when attempting to add information to the info dictionary

TheNeeloy opened this issue · 11 comments

Hello!

I have installed procgen from source on Ubuntu 20.04, and I am able to build and run the original environments.

I am attempting to follow the steps in the "Add information to the info dictionary" where the key count should be displayed and counted in the heist environment.

I have added this code to the end of the VecGame constructor in procgen/src/vecgame.cpp:

{
        struct libenv_tensortype s;
        strcpy(s.name, "heist_key_count");
        s.scalar_type = LIBENV_SCALAR_TYPE_DISCRETE;
        s.dtype = LIBENV_DTYPE_INT32;
        s.ndim = 0,
        s.low.int32 = 0;
        s.high.int32 = INT32_MAX;
        info_types.push_back(s);
}

And then I added this code to the end of the public group of code under the HeistGame class in procgen/src/games/heist.cpp:

    void observe() override {
        std::cout << "in observe";
        Game::observe();
        int32_t key_count = 0;
        for (const auto& has_key : has_keys) {
            if (has_key) {
                key_count++;
            }
        }
        *(int32_t *)(info_bufs[info_name_to_offset.at("heist_key_count")]) = key_count;
    }

I then ran this command:
python -m procgen.interactive --env-name heist

And this is the output & error I got:

building procgen...done
terminate called after throwing an instance of 'std::out_of_range'
  what():  map::at
Aborted (core dumped)

As a side note, whenever I remove the added code from the heist.cpp file, the environment is built without error and shows a stat called heist_key_count, but it's stuck at 0 as expected.

Help regarding this matter would be much appreciated. Please let me know if I can provide any further details from my end as well.

Thanks!

This looks more like an abort rather than a segfault. Can you print out the contents of info_name_to_offset here: https://github.com/openai/procgen/blob/master/procgen/src/vecgame.cpp#L306? It would be nice to make sure that the info name to offset mapping is being populated correctly.

I added print statements before, during, and after the for loop, but it doesn't seem like the code ever gets there, because I get the same output.

This is what the code looks like now:

    std::cout << "BEFORE";
    std::map<std::string, int> info_name_to_offset;
    for (size_t i = 0; i < info_types.size(); i++) {
        info_name_to_offset[info_types[i].name] = i;
        std::cout << info_name_to_offset[info_types[i].name];
    }
    std::cout << "AFTER";

I ran this command:
python -m procgen.interactive --env-name heist

And this is the output I got:

building procgen...done
terminate called after throwing an instance of 'std::out_of_range'
  what():  map::at
Aborted (core dumped)

Hmm, it never gets to the constructor of VecGame? That makes me think something else is going on. Can you run it under a debugger to see where it is crashing?

Hi,
I apologize for the delayed response.
So I was weirded out by the constructor print statements not showing up, so I reinstalled from source, and the print statements from the constructor show up now.
I am still getting the map error thrown, so this is the output of running the program through gdb:

(procgen) neeloy@neeloy-VirtualBox:~/projects/heist-marl$ gdb -ex r --args python -m procgen.interactive --env-name heist
GNU gdb (Ubuntu 9.1-0ubuntu1) 9.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...
Starting program: /home/neeloy/anaconda3/envs/procgen/bin/python -m procgen.interactive --env-name heist
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
building procgen...[Detaching after fork from child process 4630]
[Detaching after fork from child process 4636]
[Detaching after fork from child process 4638]
[Detaching after fork from child process 4639]
done
IN libenv_make
START
[New Thread 0x7fffdb8f8700 (LWP 4651)]
[New Thread 0x7fffdb0f7700 (LWP 4652)]
[New Thread 0x7fffda8f6700 (LWP 4653)]
[New Thread 0x7fffda0f5700 (LWP 4654)]
WAY BEFORE
BEFORE
0
1
2
3
AFTER
terminate called after throwing an instance of 'std::out_of_range'
  what():  map::at

Thread 2 "python" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffdb8f8700 (LWP 4651)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff7dc8859 in __GI_abort () at abort.c:79
#2  0x00007ffff087881c in __gnu_cxx::__verbose_terminate_handler ()
    at /home/conda/feedstock_root/build_artifacts/ctng-compilers_1578638331887/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libstdc++-v3/libsupc++/vterminate.cc:95
#3  0x00007ffff0876f19 in __cxxabiv1::__terminate (handler=<optimized out>)
    at /home/conda/feedstock_root/build_artifacts/ctng-compilers_1578638331887/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:47
#4  0x00007ffff0876f4f in std::terminate ()
    at /home/conda/feedstock_root/build_artifacts/ctng-compilers_1578638331887/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:57
#5  0x00007ffff087712c in __cxxabiv1::__cxa_throw (obj=obj@entry=0x7fffd400a0b0, tinfo=0x7ffff0930728 <typeinfo for std::out_of_range>, dest=0x7ffff088317e <std::out_of_range::~out_of_range()>)
    at /home/conda/feedstock_root/build_artifacts/ctng-compilers_1578638331887/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libstdc++-v3/libsupc++/eh_throw.cc:95
#6  0x00007ffff0874057 in std::__throw_out_of_range (__s=__s@entry=0x7ffff120184d "map::at")
    at /home/conda/feedstock_root/build_artifacts/ctng-compilers_1578638331887/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libstdc++-v3/src/c++11/functexcept.cc:82
#7  0x00007ffff11d8d6c in std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > >::at (__k=..., this=0x555555e984f8)
    at /usr/include/c++/9/bits/char_traits.h:325
#8  HeistGame::observe (this=0x555555e984d0) at /home/neeloy/projects/heist-marl/procgen/src/games/heist.cpp:233
#9  0x00007ffff11fbd31 in stepping_worker (stepping_thread_mutex=..., pending_games=..., pending_games_added=..., pending_game_complete=..., time_to_die=@0x555555b9af38: false)
    at /home/neeloy/projects/heist-marl/procgen/src/vecgame.cpp:134
#10 0x00007ffff0893163 in std::execute_native_thread_routine (__p=0x5555587d4040)
    at /home/conda/feedstock_root/build_artifacts/ctng-compilers_1578638331887/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libstdc++-v3/src/c++11/thread.cc:80
#11 0x00007ffff7f9e609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#12 0x00007ffff7ec5103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) 

It's specifically throwing the map error at this line in heist.cpp that I added in the observe function:
*(int32_t *)(info_bufs[info_name_to_offset.at("heist_key_count")]) = key_count;

Yup, this is the output when I also print keys:

(procgen) neeloy@neeloy-VirtualBox:~/projects/heist-marl$ python -m procgen.interactive --env-name heist
building procgen...done
IN libenv_make
START
WAY BEFORE
BEFORE
prev_level_seed : 0
prev_level_complete : 1
level_seed : 2
rgb : 3
AFTER
terminate called after throwing an instance of 'std::out_of_range'
  what():  map::at
Aborted (core dumped)

When I comment out the observe code in heist.cpp to check what the interactive game looks like without error, I do see the heist_key_count variable in the game window.

image

Wow, that was the whole problem. I didn't realize the heist_key_count wasn't in the map when it got to heist.cpp, because it wasn't before the map population code in vecgame.cpp. Thanks for the debug help. I guess I gotta brush up on the sequential logic of c++ lol.

To others who may have the same problem in the future, make sure you add any additional dictionary code in vecgame.cpp before the code that fills in the info_name_to_offset map.