sysdiglabs/kubectl-capture

Capture files not saved with ebpf and wrong format without it

WouterLoeve opened this issue · 13 comments

When I run the following command:
sudo kubectl capture test --ebpf -M 10 --snaplen 256

It says the capture has been saved to the working directory, this is however not the case. I can't find the file anywhere even if I copy paste the entire listed directory path.

Another thing that I noticed is that when I run the same command without --ebpf the capture gzip file seems to be corrupted.

gzip: capture-test-1557830100.scap.gz: unexpected end of file

When I try to use archive manager to extract the files it says that an error occurred.

Okey, I'm going to need more instructions to reproduce the error.

Are you running the plugin on GKE, EKS or AKS? Which Kubernetes version are you running?

Thanks!

same issue here, file is not saved.

@serratala Okey, I will need more details to reproduce the error.

Are you running the plugin on GKE, EKS or AKS? Which Kubernetes version are you running?

Thanks!

Same here. Running in GKE

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.2", GitCommit:"66049e3b21efe110454d67df4fa62b08ea79a19b", GitTreeState:"clean", BuildDate:"2019-05-16T18:55:03Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.6-gke.5", GitCommit:"46b3ddd492a78deae773f24b47c5c564a9b928b1", GitTreeState:"clean", BuildDate:"2019-05-18T01:19:43Z", GoVersion:"go1.11.5b4", Compiler:"gc", Platform:"linux/amd64"}

the plugin reports that files are written to disk, but nothing exists at that path.
the command i ran is similar to the example from the README:
$ kubectl capture internal-tools-ui-55b7567b8-pp9rm -M 30 --ebpf
it looks like the capture pod was unable to get a BPF probe:

make[2]: *** [/usr/src/sysdig-0.26.1/bpf/Makefile:33: /usr/src/sysdig-0.26.1/bpf/probe.o] Error 1
make[1]: *** [Makefile:1540: _module_/usr/src/sysdig-0.26.1/bpf] Error 2
make: *** [Makefile:18: all] Error 2
mv: cannot stat '/usr/src/sysdig-0.26.1/bpf/probe.o': No such file or directory
* Trying to download precompiled BPF probe from https://s3.amazonaws.com/download.draios.com/stable/sysdig-probe-binaries/sysdig-probe-bpf-0.26.1-x86_64-4.14.119%2B-b2eece47169260af476d82b56b239305.o
curl: (22) The requested URL returned error: 404 Not Found
* Failure to find a BPF probe
* Capturing system calls
* Mounting debugfs
Found kernel config at /proc/config.gz
* COS detected (build 11647.182.0), downloading and setting up kernel headers
* Downloading https://storage.googleapis.com/cos-tools/11647.182.0/kernel-src.tar.gz
* Extracting kernel sources
* Configuring Kernel
scripts/sign-file.c:25:30: fatal error: openssl/opensslv.h: No such file or directory
compilation terminated.
make[1]: *** [scripts/Makefile.host:102: scripts/sign-file] Error 1
make: *** [Makefile:572: scripts] Error 2
* Trying to compile BPF probe sysdig-probe-bpf (sysdig-probe-bpf-0.26.1-x86_64-4.14.119+-b2eece47169260af476d82b56b239305.o)
In file included from /usr/src/sysdig-0.26.1/bpf/probe.c:23:
/usr/src/sysdig-0.26.1/bpf/fillers.h:2017:26: error: no member named 'loginuid' in 'struct task_struct'
                loginuid = _READ(task->loginuid);
                                 ~~~~  ^
/usr/src/sysdig-0.26.1/bpf/plumbing_helpers.h:18:28: note: expanded from macro '_READ'
#define _READ(P) ({ typeof(P) _val;                             \
                           ^
In file included from /usr/src/sysdig-0.26.1/bpf/probe.c:23:
/usr/src/sysdig-0.26.1/bpf/fillers.h:2017:26: error: no member named 'loginuid' in 'struct task_struct'
                loginuid = _READ(task->loginuid);
                                 ~~~~  ^
/usr/src/sysdig-0.26.1/bpf/plumbing_helpers.h:20:44: note: expanded from macro '_READ'
                    bpf_probe_read(&_val, sizeof(_val), &P);    \
                                                         ^
In file included from /usr/src/sysdig-0.26.1/bpf/probe.c:23:
/usr/src/sysdig-0.26.1/bpf/fillers.h:2017:12: error: assigning to 'kuid_t' from incompatible type 'void'
                loginuid = _READ(task->loginuid);
                         ^ ~~~~~~~~~~~~~~~~~~~~~
3 errors generated.
make[2]: *** [/usr/src/sysdig-0.26.1/bpf/Makefile:33: /usr/src/sysdig-0.26.1/bpf/probe.o] Error 1
make[1]: *** [Makefile:1540: _module_/usr/src/sysdig-0.26.1/bpf] Error 2
make: *** [Makefile:18: all] Error 2
mv: cannot stat '/usr/src/sysdig-0.26.1/bpf/probe.o': No such file or directory
* Trying to download precompiled BPF probe from https://s3.amazonaws.com/download.draios.com/stable/sysdig-probe-binaries/sysdig-probe-bpf-0.26.1-x86_64-4.14.119%2B-b2eece47169260af476d82b56b239305.o
curl: (22) The requested URL returned error: 404 Not Found
* Failure to find a BPF probe
Unable to load the BPF probe
can't open BPF probe '/root/.sysdig/sysdig-probe-bpf.o': No such file or directory
----------------------
Event           #Calls
----------------------
rpc error: code = Unknown desc = Error: No such container: 68234b228e680641f25eaebd1d22fa628252b9ca16e510639714d7db1f21135e

when i try to run it without --ebpf the container is much less verbose:

* Setting up /usr/src links from host
* Unloading sysdig-probe, if present
* Running dkms install for sysdig
Error! echo
Your kernel headers for kernel 4.14.119+ cannot be found at
/lib/modules/4.14.119+/build or /lib/modules/4.14.119+/source.
* Running dkms build failed, couldn't find /var/lib/dkms/sysdig/0.26.1/build/make.log
* Trying to load a system sysdig-probe, if present
* Trying to find precompiled sysdig-probe for 4.14.119+
Found kernel config at /proc/config.gz
* Trying to download precompiled module from https://s3.amazonaws.com/download.draios.com/stable/sysdig-probe-binaries/sysdig-probe-0.26.1-x86_64-4.14.119%2B-b2eece47169260af476d82b56b239305.ko
curl: (22) The requested URL returned error: 404 Not Found
Download failed, consider compiling your own sysdig-probe and loading it or getting in touch with the sysdig community
* Capturing system calls
Unable to load the driver
error opening device /host/dev/sysdig0. Make sure you have root credentials and that the sysdig-probe module is loaded.
----------------------
Event           #Calls
----------------------
rpc error: code = Unknown desc = Error: No such container: cc0c3f2872ea1ca6b8f8085b5477d036f9a6e5f00fcecd5729ab02cfbb7450b0

here's the release information from the node:

$ cat /etc/*elease
CHROMEOS_AUSERVER=https://tools.google.com/service/update2
CHROMEOS_BOARD_APPID={76E245CF-C0D0-444D-BA50-36739C18EB00}
CHROMEOS_CANARY_APPID={90F229CE-83E2-4FAF-8479-E368A34938B1}
CHROMEOS_DEVSERVER=
CHROMEOS_RELEASE_APPID={76E245CF-C0D0-444D-BA50-36739C18EB00}
CHROMEOS_RELEASE_BOARD=lakitu-signed-mp-v3keys
CHROMEOS_RELEASE_BRANCH_NUMBER=182
CHROMEOS_RELEASE_BUILDER_PATH=lakitu-release/R73-11647.182.0
CHROMEOS_RELEASE_BUILD_NUMBER=11647
CHROMEOS_RELEASE_BUILD_TYPE=Official Build
CHROMEOS_RELEASE_CHROME_MILESTONE=73
CHROMEOS_RELEASE_DESCRIPTION=11647.182.0 (Official Build) stable-channel lakitu 
CHROMEOS_RELEASE_KEYSET=mp-v3
CHROMEOS_RELEASE_NAME=Chrome OS
CHROMEOS_RELEASE_PATCH_NUMBER=0
CHROMEOS_RELEASE_TRACK=stable-channel
CHROMEOS_RELEASE_VERSION=11647.182.0
DEVICETYPE=OTHER
GOOGLE_RELEASE=11647.182.0
HWID_OVERRIDE=LAKITU DEFAULT
BUILD_ID=11647.182.0
NAME="Container-Optimized OS"
KERNEL_COMMIT_ID=9e5137a84a421d65264c86f7ea0f29ceaeb04771
GOOGLE_CRASH_ID=Lakitu
VERSION_ID=73
BUG_REPORT_URL="https://cloud.google.com/container-optimized-os/docs/resources/support-policy#contact_us"
PRETTY_NAME="Container-Optimized OS from Google"
VERSION=73
GOOGLE_METRICS_PRODUCT_ID=26
HOME_URL="https://cloud.google.com/container-optimized-os/docs"
ID=cos

from what i can tell, there's no /lib/modules/$(uname -r)/build on the host, which is why sidecarring execsnoop (my first attempt at getting some data on these pods) was failing.

I noticed that the Google Container-Optimized OS was not listed as a supported OS for Sysdig Agent. (https://sysdigdocs.atlassian.net/wiki/spaces/Platform/pages/192151570/Host+Requirements+for+Agent+Installation)
After switching my cluster nodes to Ubuntu kubectl capture is writing files to disk properly. It would be nice if the plugin was able to report that the sysdig agent failed to install, but my issue seemed to be w/ sysdig itself, not the kubectl-capture project.

@serratala Okey, I will need more details to reproduce the error.

Are you running the plugin on GKE, EKS or AKS? Which Kubernetes version are you running?

Thanks!

I'm using KOPS in AWS
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-07T09:55:27Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.6", GitCommit:"b1d75deca493a24a2f87eb1efde1a569e52fc8d9", GitTreeState:"clean", BuildDate:"2018-12-16T04:30:10Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

is there anyway to run it verbose or to debug it? i remember in some post i saw that a service-account is needed.

checking logs on pod:

Okey, I got it.

Recently we found some errors with sysdig and COS, I'm going to dig a bit deeper on this. Thanks for the info.

And for the AWS and Kops looks quite similar. Actually we should have the precompiled binary, but I have to double check because we are receiving a 404.

Thanks for your patience!

Running kubectl-capture on AWS EKS worker nodes and I am running into segmentation fault issue when it starts to capture the system calls.

Screenshot 2019-08-30 at 3 00 45 PM

Screenshot 2019-08-30 at 3 00 57 PM

Would look into the segmentation fault later and provide more information as I uncover them.

Unfortunately it seems like the situation has not changed since June 20 when using AWS and Kops on Debian stretch

* Setting up /usr/src links from host
* Unloading sysdig-probe, if present
* Running dkms install for sysdig
Error! echo
Your kernel headers for kernel 4.9.0-8-amd64 cannot be found at
/lib/modules/4.9.0-8-amd64/build or /lib/modules/4.9.0-8-amd64/source.
* Running dkms build failed, couldn't find /var/lib/dkms/sysdig/0.26.4/build/make.log
* Trying to load a system sysdig-probe, if present
* Trying to find precompiled sysdig-probe for 4.9.0-8-amd64
Found kernel config at /host/boot/config-4.9.0-8-amd64
* Trying to download precompiled module from https://s3.amazonaws.com/download.draios.com/stable/sysdig-probe-binaries/sysdig-probe-0.26.4-x86_64-4.9.0-8-amd64-e7c3596ef9cbb651ed4f0506ff12ed45.ko
curl: (22) The requested URL returned error: 404 Not Found
Download failed, consider compiling your own sysdig-probe and loading it or getting in touch with the sysdig community
* Capturing system calls
Unable to load the driver
error opening device /host/dev/sysdig0. Make sure you have root credentials and that the sysdig-probe module is loaded.

I'm guessing the only way to actually make this work at the moment is to have sysdig pre-installed on the K8s nodes as described here ?
https://sysdigdocs.atlassian.net/wiki/spaces/Platform/pages/3571791/Agent+Install+Manual+Linux+Installation

Found a blog post for this useful tool and, like a few people here, struggling to get it to work.

Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-21T15:34:43Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.1", GitCommit:"eec55b9ba98609a46fee712359c7b5b365bdd920", GitTreeState:"clean", BuildDate:"2018-12-13T10:31:33Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}

Note: K8s hosted on site running Ubuntu 16.04.4 LTS (GNU/Linux 4.4.0-154-generic x86_64)

If I run with --ebpf then I get a message saying that EBF is not supported - fair enough, the Linux kernel is 4.4 and not supported.

Without --ebpf then the capture pod log is as follows:

* Setting up /usr/src links from host
* Unloading sysdig-probe, if present
* Running dkms install for sysdig

Kernel preparation unnecessary for this kernel.  Skipping...

Building module:
cleaning build area.....
/usr/sbin/dkms: fork: Cannot allocate memory
make -j16 KERNELRELEASE=4.4.0-124-generic -C /lib/modules/4.4.0-124-generic/build M=/var/lib/dkms/sysdig/0.26.4/build.........(bad exit status: 2)
Error! Bad return status for module build on kernel: 4.4.0-124-generic (x86_64)
Consult /var/lib/dkms/sysdig/0.26.4/build/make.log for more information.
* Running dkms build failed, dumping /var/lib/dkms/sysdig/0.26.4/build/make.log
DKMS make.log for sysdig-0.26.4 for kernel 4.4.0-124-generic (x86_64)
Fri Sep 20 06:58:38 UTC 2019
make: Entering directory '/host/usr/src/linux-headers-4.4.0-124-generic'
  LD      /var/lib/dkms/sysdig/0.26.4/build/built-in.o
  CC [M]  /var/lib/dkms/sysdig/0.26.4/build/main.o
  CC [M]  /var/lib/dkms/sysdig/0.26.4/build/dynamic_params_table.o
  CC [M]  /var/lib/dkms/sysdig/0.26.4/build/fillers_table.o
  CC [M]  /var/lib/dkms/sysdig/0.26.4/build/flags_table.o
  CC [M]  /var/lib/dkms/sysdig/0.26.4/build/ppm_events.o
  CC [M]  /var/lib/dkms/sysdig/0.26.4/build/ppm_fillers.o
  CC [M]  /var/lib/dkms/sysdig/0.26.4/build/event_table.o
  CC [M]  /var/lib/dkms/sysdig/0.26.4/build/syscall_table.o
  CC [M]  /var/lib/dkms/sysdig/0.26.4/build/ppm_cputime.o
gcc: internal compiler error: Killed (program cc1)
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-5/README.Bugs> for instructions.
make[1]: *** [scripts/Makefile.build:278: /var/lib/dkms/sysdig/0.26.4/build/main.o] Error 4
make[1]: *** Waiting for unfinished jobs....
gcc: internal compiler error: Killed (program cc1)
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-5/README.Bugs> for instructions.
make[1]: *** [scripts/Makefile.build:278: /var/lib/dkms/sysdig/0.26.4/build/ppm_events.o] Error 4
gcc: internal compiler error: Killed (program cc1)
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-5/README.Bugs> for instructions.
make[1]: *** [scripts/Makefile.build:277: /var/lib/dkms/sysdig/0.26.4/build/ppm_fillers.o] Error 4
make: *** [Makefile:1426: _module_/var/lib/dkms/sysdig/0.26.4/build] Error 2
make: Leaving directory '/host/usr/src/linux-headers-4.4.0-124-generic'
* Trying to load a system sysdig-probe, if present
* Trying to find precompiled sysdig-probe for 4.4.0-124-generic
Found kernel config at /host/boot/config-4.4.0-124-generic
* Trying to download precompiled module from https://s3.amazonaws.com/download.draios.com/stable/sysdig-probe-binaries/sysdig-probe-0.26.4-x86_64-4.4.0-124-generic-137bb299f1d3aefd14552973e8ed79ab.ko
curl: (22) The requested URL returned error: 404 Not Found
Download failed, consider compiling your own sysdig-probe and loading it or getting in touch with the sysdig community
* Capturing system calls
Unable to load the driver
error opening device /host/dev/sysdig0. Make sure you have root credentials and that the sysdig-probe module is loaded.
----------------------
Event           #Calls
----------------------

Like @mvijftigschild , it seems that we'll possibly need to install a sysdig probe on the nodes before we can get this to work?

When running without --ebpf still hitting the 404, when running with --ebpf I'm getting the corrupt file and noticed there's a core which seems to be related to draios/sysdig#1475.

Yes, under the hoods the plugin is using draios/sysdig so that much of the bugs are more likely to be reported to that repo.