grafana/xk6-disruptor

Disruptor test exceeding time limit when combined with a browser test

Closed this issue · 5 comments

mdcruz commented

During last week's k6 Office Hours, we demonstrated how to do hybrid testing with k6 browser and xk6-disruptor. One of the issues found was the disruptor test was exceeding the time limit specified. After the browser test has been completed, the disruptor scenario is still in progress.

Script used:

import { LoadAndCheck } from "../foundations/lib/frontend/basic.js";
import { ServiceDisruptor } from "k6/x/disruptor";

const BASE_URL = __ENV.BASE_URL || "http://localhost:3333";

const scenarios = {
  disrupt: {
    executor: "shared-iterations",
    iterations: 1,
    vus: 1,
    exec: "disrupt",
  },
  browser: {
    executor: "constant-vus",
    vus: 1,
    duration: "10s",
    startTime: "10s",
    exec: "browser"
  },
};

if (__ENV.DISABLE_DISRUPT) {
  delete scenarios["disrupt"];
}

export const options = {
  scenarios,
};

const fault = {
  averageDelay: "1000ms",
  // errorRate: 0.1,
  // errorCode: 500,
}

export function disrupt() {
  const disruptor = new ServiceDisruptor("pizza-info", "pizza-ns");
  const targets = disruptor.targets();
  if (targets.length == 0) {
    throw new Error("expected list to have one target");
  }

  disruptor.injectHTTPFaults(fault, "20s");
}

export async function browser() {
  await LoadAndCheck(BASE_URL, false);
}

The script can also be accessed here.

Expected result:
The test should have been completed after 30 seconds.

Actual result:
The test was completed in 01m16.5s
Please take a look at the video clip below.

k6-browser-and-disruptor.mov

One possibility is that creating the disruptor and injecting the agent in the target pod(s) takes too long. It would be interesting to measure this time or at least print a log after the disruptor is created to see how long it takes.

ka3de commented

To add a little bit of context from the k6 browser side:

In the video that you posted @mdcruz , the execution is a little bit different than what I'm getting in my local executions, in your case seems like the browser actually finishes its execution and is the disruptor the one that is stuck. Also the browser errors are a little bit misleading (we've had some issues with that).
In my case I'm running the test with k6 browser v0.10.0 and also with the current main HEAD (212dabc) adapting the test syntax, which you can build such as:

xk6 build --with github.com/grafana/xk6-browser@v0.10.0 --with github.com/grafana/xk6-disruptor

And

xk6 build --with github.com/grafana/xk6-browser@212dabc --with github.com/grafana/xk6-disruptor

In these cases I see the opposite behavior, where the disruptor finishes, but it takes longer for the browser to finish (~40s total test run), which makes sense as I will explain later.

This is an example of output when running the test:

          /\      |‾‾| /‾‾/   /‾‾/   
     /\  /  \     |  |/  /   /  /    
    /  \/    \    |     (   /   ‾‾\  
   /          \   |  |\  \ |  (‾)  | 
  / __________ \  |__| \__\ \_____/ .io

  execution: local
     script: advanced/01.browser-with-disruptor-new.js
     output: -

  scenarios: (100.00%) 2 scenarios, 2 max VUs, 10m30s max duration (incl. graceful stop):
           * disrupt: 1 iterations shared among 1 VUs (maxDuration: 10m0s, exec: disrupt, gracefulStop: 30s)
           * browser: 1 looping VUs for 10s (exec: browser, startTime: 10s, gracefulStop: 30s)

ERRO[0013] Failed to load resource: the server responded with a status of 500 (Internal Server Error)  browser_source=network line_number=0 source=browser stacktrace="<nil>" url="http://localhost:3333/_app/immutable/chunks/_layout.da46b06b.js"
ERRO[0015] {"message":"Failed to fetch dynamically imported module: http:…ocalhost:3333/_app/immutable/chunks/0.42b5aad3.js","stack":"TypeError: Failed to fetch dynamically imported mo…ocalhost:3333/_app/immutable/chunks/0.42b5aad3.js"}  source=browser-console-api
ERRO[0015] Failed to load resource: the server responded with a status of 500 (Internal Server Error)  browser_source=network line_number=0 source=browser stacktrace="<nil>" url="http://localhost:3333/_app/immutable/chunks/1.a917d3d6.js"
ERRO[0015] Failed to load resource: the server responded with a status of 500 (Internal Server Error)  browser_source=network line_number=0 source=browser stacktrace="<nil>" url="http://localhost:3333/_app/immutable/entry/error.svelte.ecbfdf8a.js"
ERRO[0043] Uncaught (in promise) GoError: getting text content of "h1": timed out after 30s
running at github.com/grafana/xk6-browser/api.Locator.TextContent-fm (native)
disrupt at LoadAndCheck (file:///home/ka3de/devel/go/src/github.com/grafana/quickpizza/k6/foundations/lib/frontend/basic-new.js:13:16(34))
browser at browser (file:///home/ka3de/devel/go/src/github.com/grafana/quickpizza/k6/advanced/01.browser-with-disruptor-new.js:55:21(8))  executor=constant-vus scenario=browser

     browser_data_received.......: 90 kB  2.1 kB/s
     browser_data_sent...........: 6.4 kB 150 B/s
     browser_http_req_duration...: avg=1.37s  min=1s     med=1.03s  max=2.02s  p(90)=2.02s  p(95)=2.02s 
     browser_http_req_failed.....: 0.00%  ✓ 0       ✗ 14 
     browser_web_vital_ttfb......: avg=1.05s  min=1.05s  med=1.05s  max=1.05s  p(90)=1.05s  p(95)=1.05s 
     data_received...............: 0 B    0 B/s
     data_sent...................: 0 B    0 B/s
     iteration_duration..........: avg=26.26s min=20.29s med=26.26s max=32.23s p(90)=31.03s p(95)=31.63s
     iterations..................: 2      0.04709/s
     vus.........................: 1      min=1     max=2
     vus_max.....................: 2      min=2     max=2


running (00m42.5s), 0/2 VUs, 2 complete and 0 interrupted iterations
disrupt ✓ [======================================] 1 VUs  00m20.3s/10m0s  1/1 shared iters
browser ✓ [======================================] 1 VUs  10s

The key point to understand the extra time that it takes for the browser module to finish is the line:

ERRO[0043] Uncaught (in promise) GoError: getting text content of "h1": timed out after 30s

This is a consequence of this locator used inside a check in the basic.js script:

check(page, {
      'header': page.locator('h1').textContent() == 'Looking to break out of your pizza routine?',
    });

Because the disruptor is making the website return random 500 HTTP status code in responses, some of the page.goto actions fail with the log error messages that we see in the output, but still this locator action is performed after the [attempted] navigation. The Locator will automatically wait looking for the specified selector up until the default timeout (30s), but because in this case the page content was not returned from the server, it fails after the TO, which is what makes the whole test run for longer time.

I can confirm that I can reproduce the behavior described by @ka3de: the disruptor scenario ends in 20s and the browser takes ~40s to finish, which is consistent with the 10s + 30s timeout hypothesis.

mdcruz commented

Thanks, both! In this case, I don't think there is anything we can do since the timings are expected. I'm also getting similar results when running this test with k6 v0.45.0.

@mdcruz As the results could not be reproduced. We are closing this issue. If this situation ever happens again, please feel free to open another issue and reference this one.