zzd1992/GBDTMO

Segmentation faults on set_data.

Closed this issue · 11 comments

The snippet linked below contains what I've tried so far, excluding the GBDTMO-EX library.
All of my approaches ended up with a Segmentation Error on the set_data lines.
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV) is the exact output.

https://gist.github.com/punoqun/f1cf23e06721f8d3c3b4ac013d6053d0

I'm running a fresh copy of ubuntu 20.04 and a newly created conda environment with python 3.6. GBDTMO library was compiled with gcc v9.3.

Sorry, this is my mistake. y_train and y_valid should be 2D arrays.
You can change the code as:

inp_dim, out_dim = 5, 10
x_train, y_train = np.random.rand(10000, inp_dim), np.random.rand(10000, out_dim)
x_valid, y_valid = np.random.rand(10000, inp_dim), np.random.rand(10000, out_dim)

and try again.

If it works, please make sure the input and output array are contiguous with float64 type for mse loss.

To give the best chance it can get I've updated the code as below, but the segmentation fault issue sadly persists.

out_dim = 10
inp_dim = 5
params = {"max_depth": 5, "lr": 0.1, 'loss': b"mse"}
booster = GBDTMulti(LIB, out_dim=out_dim, params=params)
x_train, y_train = np.random.rand(10000, inp_dim), np.random.rand(10000, out_dim)
x_valid, y_valid = np.random.rand(10000, inp_dim), np.random.rand(10000, out_dim)
x_train = np.ascontiguousarray(x_train, dtype=np.float64)
x_valid = np.ascontiguousarray(x_valid, dtype=np.float64)
y_train = np.ascontiguousarray(y_train, dtype=np.float64)
y_valid = np.ascontiguousarray(y_valid, dtype=np.float64)
booster.set_data((x_train, y_train), (x_valid, y_valid))
booster.train(30)

It is strange. I can run this code successfully on my device.
Can you show me the detailed Traceback?

I have checked this library on ubuntu14, 16, and 18. I am not sure about whether it fails on newer environment.

Apparently, it's caused by the multithreading of the set_bins method, is there any way to limit the number of threads the code can use?
I'm running this on a 12-threaded machine if it's caused by that maybe limiting the thread count to 8 can be a workaround.

Fatal Python error: Segmentation fault

Thread 0x00007fe29f0b0700 (most recent call first):
File "/home/peepo/anaconda3/envs/gbdtmo/lib/python3.6/multiprocessing/connection.py", line 379 in _recv
File "/home/peepo/anaconda3/envs/gbdtmo/lib/python3.6/multiprocessing/connection.py", line 407 in _recv_bytes
File "/home/peepo/anaconda3/envs/gbdtmo/lib/python3.6/multiprocessing/connection.py", line 250 in recv
File "/home/peepo/anaconda3/envs/gbdtmo/lib/python3.6/multiprocessing/pool.py", line 463 in _handle_results
File "/home/peepo/anaconda3/envs/gbdtmo/lib/python3.6/threading.py", line 864 in run
File "/home/peepo/anaconda3/envs/gbdtmo/lib/python3.6/threading.py", line 916 in _bootstrap_inner
File "/home/peepo/anaconda3/envs/gbdtmo/lib/python3.6/threading.py", line 884 in _bootstrap

Thread 0x00007fe29a8af700 (most recent call first):
File "/home/peepo/anaconda3/envs/gbdtmo/lib/python3.6/threading.py", line 295 in wait
File "/home/peepo/anaconda3/envs/gbdtmo/lib/python3.6/queue.py", line 164 in get
File "/home/peepo/anaconda3/envs/gbdtmo/lib/python3.6/multiprocessing/pool.py", line 415 in _handle_tasks
File "/home/peepo/anaconda3/envs/gbdtmo/lib/python3.6/threading.py", line 864 in run
File "/home/peepo/anaconda3/envs/gbdtmo/lib/python3.6/threading.py", line 916 in _bootstrap_inner
File "/home/peepo/anaconda3/envs/gbdtmo/lib/python3.6/threading.py", line 884 in _bootstrap

Thread 0x00007fe2980ae700 (most recent call first):
File "/home/peepo/anaconda3/envs/gbdtmo/lib/python3.6/multiprocessing/pool.py", line 406 in _handle_workers
File "/home/peepo/anaconda3/envs/gbdtmo/lib/python3.6/threading.py", line 864 in run
File "/home/peepo/anaconda3/envs/gbdtmo/lib/python3.6/threading.py", line 916 in _bootstrap_inner
File "/home/peepo/anaconda3/envs/gbdtmo/lib/python3.6/threading.py", line 884 in _bootstrap

Current thread 0x00007fe2cd61b740 (most recent call first):
File "/home/peepo/PycharmProjects/GBDTMO/gbdtmo/gbdtmo.py", line 22 in _set_bin
File "/home/peepo/PycharmProjects/GBDTMO/gbdtmo/gbdtmo.py", line 165 in set_data
File "/home/peepo/PycharmProjects/GBDTMO-trials/main.py", line 22 in gbdtmo_example
File "/home/peepo/PycharmProjects/GBDTMO-trials/main.py", line 60 in

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

Please set num_threads to 1 (default is 2) in params. For more information, you can read the docs.

Avoid using anaconda is another potential solution because its threading is problematic.

Here is the code using multi-thread in gbdtmo/histogram.py:

def get_bins_maps(x: np.array, max_bins: int, threads: int =1) -> (list, np.array):
    out = []
    if threads==1:
        for i in range(x.shape[-1]):
            out.append(_get_bins_maps(x[:, i], max_bins))
    else:
        x = list(np.transpose(x))
        pool = Pool(threads)
        f = partial(_get_bins_maps, max_bins=max_bins)
        out = pool.map(f, x)
        pool.close()

You can also modify it and re-install.

It was apparently caused by conda, I got it working. Thank you so much for the help.

Not using Anaconda, but still got an error: "segmentation fault /usr/local/bin/python3.9".

Not using Anaconda, but still got an error: "segmentation fault /usr/local/bin/python3.9".

Your numba version might be not satisfied.
On my device, it is 0.42.0 which is old.
You can upgrade or degrade numba to solve your problem.

I plan to remove the use of numba in the future.

I used 0.53.1 version of numba and got the same segmentation error.
I downgraded to and received it again.
The decorators does not work in version 0.42.0

@zzd1992 Could you also please check your llvmlite version?