mwydmuch/napkinXC

Segmentation Fault

JubJub623D opened this issue · 2 comments

I'm using napkinXC on Linux with a custom dataset and am having trouble encountering segfaults when trying to fit a PLT on the data. Attempting to print backtraces of the segfault results in unknown symbols as soon as the code in _napkin.cpp (calling fit on a CPPModel) is invoked, so I'm having trouble determining the true source of the issue.

My first best guess is that something went wrong with the installation of napkinXC.
The first thing I tried was installing napkinxc via pip by name, then via the .git link when that failed.
When that failed, I tried downloading the git repository and running setup.py, but that didn't work either.
The default C++ standard for gcc in the environment is C++14, but the environment also supports C++ as a non-default option.
The gcc version is 9.4.0, and the CMake version is 3.16.3.

My second best guess is that because of an issue with the custom dataset, the .cpp code attempts to access an out of bounds location and encounters a segfault.
For the input of .fit(), I'm using a numpy matrix of embedding vectors and a list of lists of numerical ground truth labels as input, which are generated from reading csv files.

X_data = pd.read_csv('embeddings_new_test.csv').to_numpy().astype(np.float32)
Y_data_str = [label.replace('[', '').replace(']','').replace(' ','').split(',') for label in pd.read_csv('labels_new_test.csv').to_string(header=False, index=False).split('\n')]
Y_data = []
for data_list in Y_data_str:
    Y_data.append([int(num) for num in data_list])

Sample embedding vector before np matrix conversion (dimensionality: (1, 44)):

1.1853809,1.8049561,-0.21211958,-4.1932855,-0.33534464,-2.9588652,-3.864022,-5.564808,1.8993871,4.2785244,4.9306583,3.9468246,-1.4078596,2.48531,1.8727794,0.7343951,-2.820231,0.28361112,2.3047895,2.7313123,1.7561926,4.286616,1.871469,-1.2939689,3.575691,1.7148826,2.4899118,-3.9518876,2.0022254,2.736418,-4.215009,-3.3079152,-1.2123864,-1.5709529,-0.20246193,-2.4258933,-2.386864,-2.19349,-2.4682508,1.5998758,-2.934224,-2.6331096,-3.2446184,2.9059627

Sample label list:
[110, 132, 143, 125, 167]

I'd be happy to provide any additional details if I happened to miss something important. Thanks for your time.

Hi @JubJub623D, thank you for reporting the problem, I'm sorry for the longer response time, I was on vacation during Christmas/New Year time. I will be happy to debug and fix it. I will try to replicate this soon, but it would help a lot if you could provide some runnable script that cause the issue.

Hi @mwydmuch, many apologies for the very late response. I've been trying to replicate this issue, but over the past few weeks I've been able to run the code without experiencing any segfaults, despite using the same dataset and making no changes to the data formatting. I'm not really sure what caused it to start working, or what caused the segfault in the first place, but I'll let you know if I find anything replicable in the future. Thanks for your time