replicatedhq/troubleshoot

Add a timeout to the run host collector

banjoh opened this issue · 2 comments

banjoh commented

Describe the rationale for the suggested feature.

The run host collector us used to run arbitrary command/programs which the collector has little control of with regards to how long the process takes. This can lead to the collector running for too long and in some cases, not even stopping

Collecting a support bundle using the spec below will never stop

apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
  name: run
spec:
  hostCollectors:
    - run:
        name: "ping-google"
        command: "ping"
        args:
          - "google.com"

Describe the feature

Add a timeout field that can be used to limit how long the collector should run. The timeout should be

Here is an example spec

    - run:
        name: "ping-google"
        timeout: 
        command: "ping"
        args:
          - "google.com"

Describe alternatives you've considered

Leave things as they are

Additional context

  • The run pod collector has a timeout field which behaves as described above. This can be "borrowed"
  • Eventually, exec.CommandContext would end up being called if there is a timeout
banjoh commented

Reopened waiting for a docs PR

Thanks Evan! I have created the doc PR here replicatedhq/troubleshoot.sh#541