google/stenographer

Non-pretty results from output file open failure

jpvlsmv opened this issue · 3 comments

I have been working to resolve an immediate constant crash from stenograper (containerized from Security Onion distribution, but not installed from CD)

The underlying cause is opening the output (pcap) file with O_DIRECT, and getting back -EINVAL because my particular filesystem (ZFS from Ubuntu 18.04) does not like that flag to be set, as seen in this log excerpt:

2021-03-10T21:42:53.973386Z T:beb0f7 [aio.cc:190] Opening packet file /tmp/stenographer958450565/PKT0/.1615412573973285: -1
2021-03-10T21:42:53.973401Z T:beb0f7 [stenotype.cc:478] CHECK(SUCCEEDED(__check_success_error__)) output.Rotate(file_dirname, micros, flag_preallocate_file_mb << 20): Invalid argument <- open
ABORTABORTABORT
/usr/bin/stenotype() [0x4081f8]
/usr/bin/stenotype() [0x430568]
/lib64/libstdc++.so.6(+0xb5330) [0x7fc43febf330]
/lib64/libpthread.so.0(+0x7ea5) [0x7fc440560ea5]
/lib64/libc.so.6(clone+0x6d) [0x7fc43f6229fd]
2021/03/10 21:42:54 Stenotype stopped after 759.732891ms: stenotype wait failed: signal: aborted (core dumped)

While a SIGABRT and core dump does effectively end the process, and probably will get someone's attention, it can be a bit distracting from the actual issue at hand, and send a seasoned sysadmin looking into strangely short stack traces and other depths of madness (the "interesting" interplay between containers, systemd, and the kernel/core_pattern sysctl)

  • Is there a more friendly way to report "File open failed: Invalid argument" and exit the process?

Digging in to stenotype, aio.cc seems to handle the situation correctly-- (although arguably in the wrong order) opening the file, and logging its operation. When the open fails, -1 is passed to util.h's Errno() getting back a util.h typedef Error wrapping ERROR("Invalid argument"), implicitly coerced from that std::string... which becomes the first argument for the RETURN_IF_ERROR macro, and after a macro expansion of SUCCEEDED(status), since its std::unique_ptr<>.get() does not come back with NULL, we return that string of "Invalid argument <- open". This *string comes up to stenotype.cc into the CHECK_SUCCESS macro, which hands the work off to another trip through SUCCEEDED and then the CHECK macro expands to a LOG(FATAL) as we see in the log file.

But what is the state of *this at this point? The call to Rotate has returned without setting its io::SingleFile* current_ nor updating its set<SingleFile*> files_.

In my instance (first output file ever to be opened), current_ is whatever it was initialized to by the runtime loader (apparently not == NULL which would give Error("no file")), and when the much-later Output::Write() attempts to call current_->Write() the operating system steps in with the observed SIGABRT. (if the core file existed, it might show garbage in current_, but alas I never found the core)

The proper fix involves maintaining the Output class invariant that current_ will be a valid (or null) pointer even if Rotate() can't open a new file. This of course lead to the questions of

  • what should the behavior be if a new .pcap file can not be opened?
  • Should it be different in case of temporary problems e.g. ENOSPC or EAGAIN than in permanent problems like EINVAL?

IMHO, the current behavior of ABORTABORTABORT (core dumped) for any error should at least be replaced by a clean exit(1);

I see that I missed the initialization of current_ to NULL in Output's constructor; and that within Output::Rotate, before the call to open(), current_ has been set to NULL again. So my initial theory of current_ containing garbage is not right. When RETURN_IF_ERROR is called, current_ is properly NULL, indicating that there is no currently-open file. Must be some other pointer that's causing the abort.

@jpvlsmv Did you ever make any progress in getting Stenographer to play nice with ZFS?

Answered my own question:

add-apt-repository ppa:jonathonf/zfs
apt-get install zfs-dkms

Bionic includes ZFS 0.7.5, while Direct I/O support wasn't added until 0.8.0.