bytedance/g3

feature request: support SSLKeyLogFile

GlenDC opened this issue · 8 comments

I got a prototype running where I enabled g3 with SSL (PSK) logging enabled. With this in place and working I would ideally like to move to a more clean solution aligned with your vision. If you do not want anything to do with this, do tell me of course and I'll close, no harm there.

In the rest of this description I'll lay out my proposal and would like to align with you on the idea and direction.

  • by default acceptors and connectors would respect the common env variable SSLKEYLOGFILE, meaning they would add callbacks to their context to log the PSK's as part of their handshakes and other exchanges;
  • within the config one can also choose to explicitly disable key logging for a tls acceptor or connector, or explicitly define a different path

This could be represented as an enum:

enum KeyLogIntent {
     #[default]
     env, // logs to the path defined at `SSLKEYLOGFILE` if the env var is defined
     disabled,
     path(PathBuf),
}

This intent can be added to configs such as file:///Users/glendc/code/github.com/witness-ai/g3-clean/g3proxy/doc/_build/html/configuration/values/tls.html#conf-value-rustls-server-config (e.g. key_log). It would be within the (ssl) server and client that we use then this config to define the callback if key logging is indeed defined.

All this is fairly harmless I think and easy enough to add, and is in line with a lot of common software out there.

A point of bigger contention is how exactly the callback is implemented.
At the moment I implemented this as follows:

use crossbeam_channel::{unbounded, Sender};
use openssl::ssl::SslRef;
use std::{fs, io::Write, path::Path, sync::OnceLock, thread};

static GLOBAL_KEY_LOG_LINE_SENDER: OnceLock<Option<Sender<String>>> = OnceLock::new();

/// return a key log callback if one was requested using the SSLKEYLOGFILE
///
/// # Panics
///
/// This function panics in case the opening of this file failed for whatever reason.
/// It is desired to panic upon such failure as it means the user's desire
/// to keylog the SSL traffic could not be fulfilled, which is a critical failure.
pub fn key_log_callback_if_requested() -> Option<impl Fn(&SslRef, &str) + 'static + Sync + Send> {
    GLOBAL_KEY_LOG_LINE_SENDER
        .get_or_init(|| {
            std::env::var("SSLKEYLOGFILE").ok().map(|path_str| {
                let path = Path::new(&path_str);
                if let Some(parent) = path.parent() {
                    fs::create_dir_all(parent).expect("parent dir of SSLKEYLOGFILE to be created");
                }

                let mut file = std::fs::OpenOptions::new()
                    .create(true)
                    .append(true)
                    .open(path)
                    .expect("open SSLKEYLOGFILE in append mode");

                let (s, r) = unbounded();

                log::info!("opened SSLKEYLOGFILE @ {path_str}");

                thread::spawn(move || {
                    for line in r {
                        writeln!(file, "{}", line)
                            .expect("SSLKEYLOGFILE: failed to write key log line");
                    }
                });

                s
            })
        })
        .clone()
        .map(|sender| {
            move |_ssl_ref: &SslRef, line: &str| {
                sender
                    .send(line.to_owned())
                    .expect("send line to write it to SSLKEYLOGFILE");
            }
        })
}

I understand however that this might not be ideal given:

  • we do not really need our own atomic guarantees as our writes are atomic on their own according to the assumptions that we can make on the linux platforms (I think)
  • in case of slow disk I/O this channel can ever grow

An easier solution would be to use a Mutex, but that would have a big performance penalty on such a hot path.

In codebases such as Chromium I noticed that they seem to be using a single global file handle, as can be seen at https://source.chromium.org/chromium/chromium/src/+/main:third_party/boringssl/src/tool/client.cc?q=SSLKEYLOGFILE&ss=chromium. Perhaps we could also try to do that instead of a channel, it would however require unsafe code AFAIK given we would need a static file handle that we need to mutate.

I believe the spirit of this proposal to be something you can stand behind, I am however not certain how you see the implementations. Curious to hear your opinion and feedback on that matter. In worst case we just close this issue, but in best case we can get aligned on your vision so I can prepare a PR for you at a later point of time.

Would this be used for debug purpose, or for production?

I imagine most will use it for debug purposes, but once it lands in the g3 codebase it is not unthinkable that some might use it for production use. As such it should be as of little impact to the performance and stability as possible.

Initially we can make the code bebind a debug cfg flag, and automatically enable it by just checking env vars. I'm OK with channel or mutex lock here if the code can only take effect in debug build.
For production usage, a local key log file might won't work in my view. We can add config options later after someone show the needs for it.

ok if I make an explicit feature for it, like sslkeylog or something like that? You could then still in future add a feature debug which includes features such as sslkeylog. As it still leaves open the option for people to enable this behaviour without possibly enabling other debug stuff that might not be desired.

For my approach I used crossbeam channel btw, would that be ok for you or you have a strong preference here?

OK. I prefer to use flume. crossbeam-channelcan also be used if it has better performance.

I have no opinion here, I'll use flume, not an issue :)
Will work on a patch later. Thx!

@zh-jq just wanted to let you know that @GlenDC works for me. We are finally getting some cycles to contribute back.

This issue can be closed, as we will not really need it if we cannot define granular options via config.
A simple on/off switch won't work for us, given we might have custom desires, e.g. disabling it for specific acceptors/connectors, as well as limiting the log file size, etc.

I understand your position and desire to keep it simple, as such fine for me to close for now. In case someone does come with a different use case that warranties it a reason to be this can be re-opened or replaced by another issue.