afitzek/hasura-metric-adapter

Thread blocking

Closed this issue · 2 comments

Hey,

I've been experimenting with the app and encountered the following issue:

Although the app is mostly async, there are several blocking calls:

match termination_rx.recv_timeout(std::time::Duration::from_millis(cfg.collect_interval)) {

match termination_rx.recv_timeout(Duration::from_millis(sleep_time)) {

match watch_receiver.recv_timeout(Duration::from_millis(sleep_time)) {

When execution reaches any of these calls, it will block the thread.


Let's take a look at the first example (inside run_metadata_collector function):

  1. This stage is reached after every iteration of the metadata collector.
  2. The thread is being blocked for the duration of cfg.collect_interval, which is 15s be default.
  3. run_metadata_collector is joined with the read_file async branch:
    let res = tokio::try_join!(
    webserver(&config),
    logreader::read_file(&config.log_file, &metric_obj, config.sleep_time, &terminate_rx),
    collectors::run_metadata_collector(&config, &metric_obj, &terminate_rx)
    );
  4. Now the read_file is unable to continue, until the thread is unblocked by run_metadata_collector.
  5. This results in logs being processed by read_file in short bursts every 15s and not as soon as they arrive.

The effect this has on the app's behavior when hasura generates lots of logs depends on where the logs are redirected to:

  • normal file - the metrics adapter can't keep up and reports the logs metrics with delay, giving incorrect representation of the system state.
  • named pipe - as soon as the pipe's buffer is filled, it blocks the writer process (hasura) until the reader process (metrics adapter) frees the buffer by reading more lines.

The named pipe case is the most troublesome, as it results in hasura requests being throttled. I think it is the cause of this issue: #31.


Reproduction case:

  1. Clone the repo.
  2. Start docker compose.
  3. Go to the Data tab in the hasura console (http://localhost:8080/console/data/)
  4. Create an empty table (e.g. table test with single column id).
  5. Go to the API tab in the hasura console (http://localhost:8080/console/api/api-explorer).
  6. Prepare a simple request to the table created above in the GraphiQL, e.g.:
query MyQuery {
  test {
    id
  }
}
  1. Open the browser network tab and start constantly sending requests with Ctrl+Enter.
  2. After a short while the requests will stop resolving in the network tab as the named pipe buffer is filled with unprocessed logs due to metric adapter's thread being blocked.
  3. At this point the whole hasura console will stop working correctly and will get back to normal only when the metric adapter catches up with the accumulated logs.

@afitzek I've created a PR with the proposed fix, please take a look when you have time: #38

@dddenis Thank you very much for the very detailed explanation and the nice PR!