elbamos/largeVis

caught segfault : 'memory not mapped'

Closed this issue · 30 comments

Got the following segfault error with largeVis, as the 'bench' branch was not available, have recompiled from github/master without OpenMP as suggested in here.

For compiling without OpenMP I made the Makevars file as follows,

PKG_LIBS = $(FLIBS) $(LAPACK_LIBS) $(BLAS_LIBS)
PKG_CXXFLAGS = -DARMA_64BIT_WORD -DNDEBUG
CXX_STD=CXX11
LDFLAGS = $(LDFLAGS)

and compiled as

R-3.3.1 CMD INSTALL largeVis-master

The error message :

> library(largeVis)
Loading required package: Rcpp
Loading required package: Matrix

Attaching package: ‘Matrix’

The following object is masked from ‘package:tidyr’:

    expand

largeVis was compiled without OpenMP support.
> neig<-randomProjectionTreeSearch(t(dat.small.matrix), K=10, tree_threshold = 100, max_iter = 15, n_trees = 10)

 *** caught segfault ***
address 0x75a8, cause 'memory not mapped'

Traceback:
 1: .Call("largeVis_searchTrees", PACKAGE = "largeVis", threshold,     n_trees, K, maxIter, data, distMethod, seed, threads, verbose)
 2: searchTrees(threshold = as.integer(tree_threshold), n_trees = as.integer(n_trees),     K = as.integer(K), maxIter = as.integer(max_iter), data = x,     distMethod = as.character(distance_method), seed = seed,     threads = threads, verbose = as.logical(verbose))
 3: randomProjectionTreeSearch.matrix(t(dat.small.matrix), K = 10,     tree_threshold = 100, max_iter = 15, n_trees = 10)
 4: randomProjectionTreeSearch(t(dat.small.matrix), K = 10, tree_threshold = 100,     max_iter = 15, n_trees = 10)

Maybe I didn't compile it properly since the error still occurs in the 'multiprocessing step'.

Here is sample random data, it crashed for this data. But ran perfectly for the original data in Mac OS, but replicating it in Linux got into the issue.

dat<-data.frame(x=rnorm(n=10000, mean=0.5, sd=0.1), y=rnorm(n=1000, mean=0.5, 0.1)) %>% 
  rbind(data.frame(x=rnorm(n=1000, mean=0.25, sd=0.05), y=rnorm(n=1000, mean=0.25, 0.05))) %>% 
  rbind(data.frame(x=rnorm(n=1000, mean=0.15, sd=0.05), y=rnorm(n=1000, mean=0.5, 0.05)))

dat.small.matrix <- dat %>% as.matrix.noquote() %>% apply(2, as.numeric)
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             8
NUMA node(s):          8
Vendor ID:             AuthenticAMD
CPU family:            16
Model:                 4
Model name:            Quad-Core AMD Opteron(tm) Processor 8384
Stepping:              2
CPU MHz:               2693.060
BogoMIPS:              5385.73
Virtualization:        AMD-V
L1d cache:             64K
L1i cache:             64K
L2 cache:              512K
L3 cache:              6144K
NUMA node0 CPU(s):     0-3
NUMA node1 CPU(s):     4-7
NUMA node2 CPU(s):     8-11
NUMA node3 CPU(s):     12-15
NUMA node4 CPU(s):     16-19
NUMA node5 CPU(s):     20-23
NUMA node6 CPU(s):     24-27
NUMA node7 CPU(s):     28-31

Ok, I have tested the sample data before posting it here and mentioned it also failed, here is the original data set. And the R session info.

R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: openSUSE 13.1 (Bottle) (x86_64)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] largeVis_0.2    Matrix_1.2-7.1  Rcpp_0.12.10    dplyr_0.5.0     purrr_0.2.2     readr_1.1.0     tidyr_0.6.0     tibble_1.2     
 [9] ggplot2_2.2.1   tidyverse_1.1.1

loaded via a namespace (and not attached):
 [1] xml2_1.1.1       magrittr_1.5     hms_0.3          rvest_0.3.2      mnormt_1.5-5     munsell_0.4.3    colorspace_1.3-2 lattice_0.20-34 
 [9] R6_2.2.0         httr_1.2.1       stringr_1.1.0    plyr_1.8.4       tools_3.3.1      parallel_3.3.1   grid_3.3.1       broom_0.4.2     
[17] nlme_3.1-128     gtable_0.2.0     psych_1.7.3.21   DBI_0.6          modelr_0.1.0     readxl_0.1.1     lazyeval_0.2.0   assertthat_0.1  
[25] reshape2_1.4.2   haven_1.0.0      stringi_1.1.2    forcats_0.2.0    scales_0.4.1     lubridate_1.6.0  jsonlite_1.3     foreign_0.8-67  
idroz commented

Encountered a similar issue on ubuntu trusty:

c <- largeVis(t(data.matrix(iris[,1:4])))
*** caught segfault ***
address 0x7f8, cause 'memory not mapped'
Traceback:
1: .Call("largeVis_searchTrees", PACKAGE = "largeVis", threshold,     n_trees, K, maxIter, data, distMethod, seed, threads, verbose)

2: searchTrees(threshold = as.integer(tree_threshold), n_trees = as.integer(n_trees),     K = as.integer(K), maxIter = as.integer(max_iter), data = x,     distMethod = as.character(distance_method), seed = seed,     threads = threads, verbose = as.logical(verbose))

3: randomProjectionTreeSearch.matrix(x, n_trees = n_trees, tree_threshold = tree_threshold, K = K, max_iter = max_iter, distance_method = distance_method,     threads, verbose = verbose)

4: randomProjectionTreeSearch(x, n_trees = n_trees, tree_threshold = tree_threshold,     K = K, max_iter = max_iter, distance_method = distance_method,     threads, verbose = verbose)

5: largeVis(t(data.matrix(iris[, 1:4])))

Tested with and without OpenMP flag - error still persists. Seems to be Linux-specific as works perfectly fine on a mac.

And sessionInfo():

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] largeVis_0.2  Matrix_1.2-10 Rcpp_0.12.10

loaded via a namespace (and not attached):
 [1] colorspace_1.3-2 scales_0.4.1     compiler_3.4.0   lazyeval_0.2.0
 [5] plyr_1.8.4       gtable_0.2.0     tibble_1.3.0     ggplot2_2.2.1
 [9] grid_3.4.0       munsell_0.4.3    lattice_0.20-35

@idroz Thanks for that. Could I ask you for a couple of small things?

First is, can you try with R 3.3 on the same system and see if you see the error?

Second is, can you try with the branch that's up here as release/0.2.1?

R 3.4 has changed a bunch of things in how packages with C++ code need to integrate with R, and its really complicated testing for release.

idroz commented

Thanks for that. Tried with R 3.3.1 and release/0.2.1 as well as master branch, unfortunately segfault persists. Will try to play around with Makevars to see if any flags might be contributing to this issue.

Thanks, @idroz, working on it, appreciate your reporting.

I'm able to reproduce. I'll try to push an update out soon. It probably is tied to a compiler setting; if you make any progress on that front let me know.

Very strange... I can reproduce this on my AWS box but not on my linux box at home, both running 16.04.

idroz commented

I managed to get it to work on my 14.04 box. I had to upgrade gcc/g++/gfortran to version 5 and recompile the package. Had versions 4.9 running before that. @elbamos, wonder if you have a similar situation on your AWS box vs home linux?

My /etc/R/Makeconf file had the following changes made to it:

CC = gcc-5 -std=gnu99
CXX = g++-5
CXX1X = g++-5

If you want to change default gcc and g++ to version 5, do:

sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 60 --slave /usr/bin/g++ g++ /usr/bin/g++-5

idroz commented

Not seeing any test failures. Works perfectly fine from a fresh CRAN install on Ubuntu 16.04 with default gcc/g++ version 5.4.0.

@idroz I'm was able to reproduce this on my aws box once but, no longer. Are you able to isolate the issue to gcc 4.9?

idroz commented

I'm getting consistent 'memory not mapped' with gcc/g++ 4.8.4.

≥4.9 is OK.

idroz commented

It was g++

idroz commented

4.8.4

idroz commented

Yup - that's the one. 4.9 and 5 are fine. Haven't tested on 6 and up.

@NagaComBio Can you confirm that you were compiling with gcc before 4.8 also?

@elbamos No, my default version is gcc (SUSE Linux) 4.8.1 20130909. And I just recompiled largeVis/master with gcc (GCC) 6.2.0 and it worked without the segmentation fault.

idroz commented

@elbamos - sounds like a good way around the issue. Thanks a lot for looking into it.

I agree as well. Thank you @elbamos @idroz.

Thanks guys!

I have a version up in the develop branch. It should fail to compile on gcc < 4.9 but work fine for you otherwise. If you want to give it a try, I'll close this issue.

Just tested the 'develop' branch with gcc < 4.9 and got the error message and the same version worked fine for the gcc 6.2.0. Cheers.

In file included from checkfunctions.cpp:1:0:
largeVis.h:7:2: error: #error largeVis is incompatible with gcc < 4.9. Upgrade gcc or use llvm.
 #error largeVis is incompatible with gcc < 4.9. Upgrade gcc or use llvm.
  ^
make: *** [checkfunctions.o] Error 1
ERROR: compilation failed for package ‘largeVis’
idroz commented

Works great 👍