rui314/mold

Potenial fd leak with clang lto and cause too many open files

karuboniru opened this issue · 0 comments

My Environment

$ ld --version     
mold 2.34.1 (compatible with GNU ld)

$ clang++ --version                                                                                                                   
clang version 20.0.0pre20241012.gb1746894deebe3 (Fedora 20.0.0~pre20241012.gb1746894deebe3-2.fc41)
Target: x86_64-redhat-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Configuration file: /etc/clang/x86_64-redhat-linux-gnu-clang++.cfg

$ ulimit -a 
-t: cpu time (seconds)              unlimited
-f: file size (blocks)              unlimited
-d: data seg size (kbytes)          unlimited
-s: stack size (kbytes)             8192
-c: core file size (blocks)         unlimited
-m: resident set size (kbytes)      unlimited
-u: processes                       511419
-n: file descriptors                1024
-l: locked-in-memory size (kbytes)  8192
-v: address space (kbytes)          unlimited
-x: file locks                      unlimited
-i: pending signals                 511419
-q: bytes in POSIX msg queues       819200
-e: max nice                        0
-r: max rt priority                 0
-N 15: rt cpu time (microseconds)   unlimited

  • Project being built: https://github.com/Geant4/geant4/releases/tag/v11.2.2
  • Error message:
    /usr/bin/clang++ -fPIC -W -Wall -pedantic -Wno-non-virtual-dtor -Wno-long-long -Wwrite-strings -Wpointer-arith -Woverloaded-virtual -Wno-variadic-macros -Wshadow -pipe -Qunused-arguments -DGL_SILENCE_DEPRECATION -pthread  -O2 -g -DNDEBUG -flto=thin   -shared -Wl,-soname,libG4processes.so -o BuildProducts/lib64/libG4processes.so @CMakeFiles/G4processes.rsp 
    mold: fatal: opening source/CMakeFiles/G4processes.dir/processes/hadronic/models/im_r_matrix/src/G4CollisionNNToDeltaDelta.cc.o failed: Too many open files
    clang++: error: linker command failed with exit code 1 (use -v to see invocation)
    
  • related input to clang (it is one object file per line)
    $ cat CMakeFiles/G4processes.rsp |wc -l              
    1710
    

Only reproducable with:

  • LTO enabled
  • mold is used
    So not reporting to llvm project as lld or bfd can do the link with same ulimit.
  • Can be workarounded by raising ulimit -n

strace -f: (part of, full strace result (gzipped))

[pid 320263] openat(AT_FDCWD, "source/CMakeFiles/G4processes.dir/processes/hadronic/models/im_r_matrix/src/G4CollisionMesonBaryon.cc.o", O_RDONLY) = 1014
[pid 320263] openat(AT_FDCWD, "source/CMakeFiles/G4processes.dir/processes/hadronic/models/im_r_matrix/src/G4XNDeltaTable.cc.o", O_RDONLY) = 1015
[pid 320263] fstat(1015, {st_mode=S_IFREG|0644, st_size=54628, ...}) = 0
[pid 320263] mmap(NULL, 54628, PROT_READ|PROT_WRITE, MAP_PRIVATE, 1015, 0) = 0x7f08aec9b000
[pid 320263] close(1015)                = 0
[pid 320263] openat(AT_FDCWD, "source/CMakeFiles/G4processes.dir/processes/hadronic/models/im_r_matrix/src/G4XNDeltaTable.cc.o", O_RDONLY) = 1015
[pid 320263] openat(AT_FDCWD, "source/CMakeFiles/G4processes.dir/processes/hadronic/models/im_r_matrix/src/G4CollisionMesonBaryonElastic.cc.o", O_RDONLY) = 1016
[pid 320263] fstat(1016, {st_mode=S_IFREG|0644, st_size=174592, ...}) = 0
[pid 320263] mmap(NULL, 174592, PROT_READ|PROT_WRITE, MAP_PRIVATE, 1016, 0) = 0x7f08aec70000
[pid 320263] close(1016)                = 0
[pid 320263] openat(AT_FDCWD, "source/CMakeFiles/G4processes.dir/processes/hadronic/models/im_r_matrix/src/G4CollisionMesonBaryonElastic.cc.o", O_RDONLY) = 1016
[pid 320263] openat(AT_FDCWD, "source/CMakeFiles/G4processes.dir/processes/hadronic/models/im_r_matrix/src/G4XNNElastic.cc.o", O_RDONLY) = 1017
[pid 320263] fstat(1017, {st_mode=S_IFREG|0644, st_size=161368, ...}) = 0
[pid 320263] mmap(NULL, 161368, PROT_READ|PROT_WRITE, MAP_PRIVATE, 1017, 0) = 0x7f08aec48000
[pid 320263] close(1017)                = 0

This follows a pattern like: [open a object file] -> [fstat a file] -> [mmap from the file] -> [close the file] (expect to start proceeding new file) -> [open the file again] (leak here)


By attaching gdb and reading the code, it seems the files are opened from mold::create_plugin_input_file and the comments at

mold/src/lto-unix.cc

Lines 640 to 646 in 0841ffc

// It looks like GCC doesn't need fd after claim_file_hook() while
// LLVM needs it and takes the ownership of fd. To prevent "too many
// open files" issue, we close fd only for GCC. This is ugly, though.
if (!is_llvm(ctx)) {
MappedFile *mf2 = mf->parent ? mf->parent : mf;
mf2->close_fd();
}
explains why the opened files should not be closed for llvm lto. 🤔