Segfault when calling `read_tsv()` on an HPC cluster
Opened this issue · 0 comments
Hi,
I'm having an issue with read_tsv()
which appears to be the segfault mentioned here: #510
I'm calling the function inside a conda environment on an HPC. Running it interactively on the file in the conda environment on the head node works fine, but when running as a job within the cluster I get a segfault every time, which is all way above my skill level.
The error I see in my log files is:
*** caught segfault ***
address (nil), cause 'memory not mapped'
Traceback:
1: vroom_(file, delim = delim %||% col_types$delim, col_names = col_names, col_types = col_types, id = id, skip = skip, col_select = col_select, name_repair = .name_repair, na = na, quote = quote, trim_ws = trim_ws, escape_double = escape_double, escape_backslash = escape_backslash, comment = comment, skip_empty_rows = skip_empty_rows, locale = locale, guess_max = guess_max, n_max = n_max, altrep = vroom_altrep(altrep), num_threads = num_threads, progress = progress)
2: vroom::vroom(file, delim = "\t", col_names = col_names, col_types = col_types, col_select = { { col_select } }, id = id, .name_repair = name_repair, skip = skip, n_max = n_max, na = na, quote = quote, comment = comment, skip_empty_rows = skip_empty_rows, trim_ws = trim_ws, escape_double = TRUE, escape_backslash = FALSE, locale = locale, guess_max = guess_max, show_col_types = show_col_types, progress = progress, altrep = lazy, num_threads = num_threads)
3: fn(x)
4: FUN(X[[i]], ...)
5: lapply(rna_files, function(x) { ln <- readLines(x, 1) fn <- paste0("read_", ifelse(grepl("\\t", ln), "tsv", "csv")) fn <- match.fun(fn) df <- fn(x) gn_col <- intersect(c("gene_id", "Geneid"), names(df))[[1]] fc_col <- intersect(c("logFC", "logfc"), names(df))[[1]] fdr_col <- intersect(c("fdr", "FDR", "adjP", "adj_p"), names(df))[[1]] dplyr::select(df, gene_id = !!sym(gn_col), logFC = !!sym(fc_col), FDR = !!sym(fdr_col))})
6: lapply(rna_files, function(x) { ln <- readLines(x, 1) fn <- paste0("read_", ifelse(grepl("\\t", ln), "tsv", "csv")) fn <- match.fun(fn) df <- fn(x) gn_col <- intersect(c("gene_id", "Geneid"), names(df))[[1]] fc_col <- intersect(c("logFC", "logfc"), names(df))[[1]] fdr_col <- intersect(c("fdr", "FDR", "adjP", "adj_p"), names(df))[[1]] dplyr::select(df, gene_id = !!sym(gn_col), logFC = !!sym(fc_col), FDR = !!sym(fdr_col))})
Is that vroom
release mentioned in the above issue able to be released soon? I notice it's still at v1.6.5.***
.
Relevant package versions & the HPC OS below, however this is from the head node. When I look at other files where I've printed a sessionInfo()
when running on the cluster, I don't seem to get the Running under: Red Hat Enterprise Linux 8.4 (Ootpa)
and Matrix products: default BLAS/LAPACK: /hpcfs/users/******/envs/f4994948c5b33369acc304940a5fa825_/lib/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
lines. I'm not sure if that's helpful information or not though.
sessionInfo()
R version 4.3.3 (2024-02-29)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux 8.4 (Ootpa)
Matrix products: default
BLAS/LAPACK: /hpcfs/users/******/envs/f4994948c5b33369acc304940a5fa825_/lib/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
locale:
[1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
[5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
[7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
time zone: Australia/Adelaide
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] vroom_1.6.5 readr_2.1.5
loaded via a namespace (and not attached):
[1] utf8_1.2.4 R6_2.5.1 tidyselect_1.2.0 bit_4.0.5
[5] tzdb_0.4.0 magrittr_2.0.3 glue_1.7.0 tibble_3.2.1
[9] pkgconfig_2.0.3 bit64_4.0.5 lifecycle_1.0.4 cli_3.6.2
[13] fansi_1.0.6 vctrs_0.6.5 compiler_4.3.3 hms_1.1.3
[17] pillar_1.9.0 crayon_1.5.2 rlang_1.1.3