cri-o/cri-o

k8s e2e tests [149/149]

runcom opened this issue ยท 34 comments

Not that bad for a very first run of k8s e2e tests: 117/151 ๐Ÿ˜ธ

Full logs available at:

test yourself with:

# kube commit 2899f47bc

$ sudo setenforce 0

# either this way or start crio.service
$ cd $GOPATH/src/github.com/kubernetes-incubator/cri-o && \
  sudo ./crio --cgroup-manager=systemd --log debug.log --debug --runtime \
  $GOPATH/src/github.com/opencontainers/runc/runc --conmon $PWD/conmon/conmon \
  --seccomp-profile=$PWD/seccomp.json

$ sudo PATH=$GOPATH/src/k8s.io/kubernetes/third_party/etcd:${PATH} \
  PATH=$PATH GOPATH=$GOPATH \
  ALLOW_PRIVILEGED=1 \
  CONTAINER_RUNTIME=remote CONTAINER_RUNTIME_ENDPOINT='/var/run/crio.sock \
  --runtime-request-timeout=5m' ALLOW_SECURITY_CONTEXT="," \
  DNS_SERVER_IP="192.168.1.5" API_HOST="192.168.1.5" \
  API_HOST_IP="192.168.1.5" KUBE_ENABLE_CLUSTER_DNS=true ./hack/local-up-cluster.sh

# on Fedora
$ sudo systemctl stop firewalld
$ sudo iptables -F

$ KUBERNETES_PROVIDER=local KUBECONFIG=/var/run/kubernetes/admin.kubeconfig \
  go run hack/e2e.go -v --test -test_args="-host=https://localhost:6443 --ginkgo.focus=\[Conformance\]" \
  | tee e2e.log

# enjoy
Summarizing 2 Failures:

[Fail] [k8s.io] Projected [It] should project all components that make up the projection API [Conformance] [Volume] [Projection]
/home/amurdaca/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/framework/util.go:2213

[Fail] [k8s.io] DNS [It] should provide DNS for the cluster [Conformance]
/home/amurdaca/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/dns.go:223

Ran 151 of 603 Specs in 5721.093 seconds
FAIL! -- 149 Passed | 2 Failed | 0 Pending | 452 Skipped --- FAIL: TestE2E (5721.12s)
FAIL

Ginkgo ran 1 suite in 1h35m21.34754297s
Test Suite Failed

Nice - next target is getting these to 0 failures :)

alright, trimming down the list above a lot :) many of the failures is just beacuse local-up-cluster.sh runs with SecurityContextDeny. I'll re-run the test and post an updated list :)

(also updated first comment for a way to run e2e for anyone to try out)

FYI I figured the following test:

[Fail] [k8s.io] EmptyDir volumes [It] volume on default medium should have the correct mode [Conformance] [Volume]
/home/amurdaca/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/framework/pods.go:76

will be fixed by using ALLOW_PRIVILEGED=1 in local-up-cluster.sh or, if you run k8s directly, by adding --allow-privileged to kubelet.

Not sure that fixes other tests here, I'll re-run them once I finish the run I started in my previous comment :)

first comment updated with new result, 137/151, going to re-run the suite as per my previous comment.

diff --git a/server/container_create.go b/server/container_create.go
index c15985e..5cbab69 100644
--- a/server/container_create.go
+++ b/server/container_create.go
@@ -589,6 +589,14 @@ func (s *Server) createSandboxContainer(ctx context.Context, containerID string,
 
 	containerImageConfig := containerInfo.Config
 
+	// TODO: volume handling in CRI-O
+	//       right now, we do just mount tmpfs in order to have images like
+	//       gcr.io/k8s-testimages/redis:e2e to work with CRI-O
+	for dest := range containerImageConfig.Config.Volumes {
+		destOptions := []string{"mode=1777", "size=" + strconv.Itoa(64*1024*1024), label.FormatMountLabel("", sb.mountLabel)}
+		specgen.AddTmpfsMount(dest, destOptions)
+	}
+
 	processArgs, err := buildOCIProcessArgs(containerConfig, containerImageConfig)
 	if err != nil {
 		return nil, err

The patch above makes many other tests pass for tests with images that defines a Config.Volumes map. Can be used as a workaround for now till we have correct volume handling in CRI-O. I'll make a PR for it shortly cause there's no reason for now to have those images not working. The tmpfs stuff is an hack, yes, still useful for now.

I think we can add this tmpfs volume as a temporary fix. The downside is more RAM usage for containers with VOLUMEs so would want to move to disk-backed cri-o managed volumes.

I think we can add this tmpfs volume as a temporary fix. The downside is more RAM usage for containers with VOLUMEs so would want to move to disk-backed cri-o managed volumes.

I'll test with that patch and see how it goes. Will report status here

for the record, I'm running e2e on Fedora 25 and all network tests currently failing seem to be resolved by just sudo iptables -F before running tests again - no clue why but it's something documented

@runcom iptables -F removes all the rules so not surprising :) We do want to fix it the right way by just adding the right rules. I wouldn't advise running iptables -F outside of a test VM.

139/151, updated first comment and logs available at https://runcom.red/e2e-1.log

One of them is the projected failure which is really a bug in kube so one less to worry about ;)

1st and 2nd test used to pass with node-e2e though (and also if you look at the first run of e2e tests I did https://runcom.red/e2e.log you can see them passing), we need to understand why they're failing now.

[Fail] [k8s.io] PreStop [It] should call prestop when killing a pod [Conformance]
/home/amurdaca/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/pre_stop.go:174

Fixed by #537

Port forwarding tests panic, fix is here #542

though, it seems our nsenter/socat implementation isn't correctly working (even if it's copy/pasted from the dockershim one, I verified this by running the same tests with docker, and they pass with it, but fail with CRI-O for some weird, network, issue I guess).

found the root cause, it's actually a bug in CRI-O itself, port forwarding tests now passes with #542 + a fix coming in a moment ๐Ÿ˜Ž

All port forwarding tests fixed by #542 and #543 - @mrunalp PTAL at those :)

I'll re-run the whole e2e once those 2 PRs are merged.

note that these 4 pass also:

[Fail] [k8s.io] Kubectl client [k8s.io] Kubectl run rc [It] should create an rc from an image [Conformance]
/home/amurdaca/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/kubectl.go:1165

[Fail] [k8s.io] Kubectl client [k8s.io] Kubectl rolling-update [It] should support rolling-update to same image [Conformance]
/home/amurdaca/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/kubectl.go:177

[Fail] [k8s.io] Probing container [It] should *not* be restarted with a /healthz http liveness probe [Conformance]
/home/amurdaca/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/common/container_probe.go:404

[Fail] [k8s.io] KubeletManagedEtcHosts [It] should test kubelet managed /etc/hosts file [Conformance]
/home/amurdaca/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/framework/exec_util.go:107

we're now at 144/151 ๐ŸŽ‰ , I've updated the result in the first comment so look there for failing tests. The remaining failing tests are either network issues (probably because of my laptop) or tests that relies on attach (but @mrunalp is working on it ๐Ÿ‘)

@sameo @mrunalp this test works fine only after disabling firewalld and flushing iptables (and also applying #544 to master), otherwise it always fail (maybe more net tests are also blocked by firewalld)

systemctl stop firewalld
iptables -F
[Fail] [k8s.io] PreStop [It] should call prestop when killing a pod [Conformance]
/home/amurdaca/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/pre_stop.go:174

So 3 of the 7 failures are flaky or misconfigurations, we need to tackle just 4 then, that are all related to attach :)

Turns out this test:

[Fail] [k8s.io] Kubectl client [k8s.io] Guestbook application [It] should create and stop a working application [Conformance]
/home/amurdaca/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/kubectl.go:1718

is failing just because we run the tests w/o a DNS service when doing local-up-cluster, one way to fix it when running the test is to follow this https://github.com/linzichang/kubernetes/blob/master/examples/guestbook/README.md#finding-a-service

(in the CI, we'll likely switch to env DNS as pointed out in that readme)

Yeah, we could switch to env for that test ๐Ÿ‘

That test is basically failing because hack/local-up-cluster.sh is docker specific I guess https://github.com/kubernetes/kubernetes/blob/master/hack/local-up-cluster.sh#L52-L58

So the only test failing now is the attach one :) I just confirmed the following test works fine (it was a kubelet misconfiguration wrt to kube-dns). We need to enable kube-dns in local-up-cluster to test it, like this:

$ sudo PATH=$GOPATH/src/k8s.io/kubernetes/third_party/etcd:${PATH} \    
  PATH=$PATH GOPATH=$GOPATH \
  ALLOW_PRIVILEGED=1 \
  CONTAINER_RUNTIME=remote CONTAINER_RUNTIME_ENDPOINT='/var/run/crio.sock \
  --runtime-request-timeout=5m' ALLOW_SECURITY_CONTEXT="," DNS_SERVER_IP="192.168.1.5" API_HOST="192.168.1.5" API_HOST_IP="192.168.1.5" KUBE_ENABLE_CLUSTER_DNS=true ./hack/local-up-cluster.sh

After setting KUBE_ENABLE_CLUSTER_DNS=true and setting API_HOST, API_HOST_IP and DNS_SERVER_IP to the vm ip address, the test passes:

โ€ข [SLOW TEST:116.981 seconds]
[k8s.io] Kubectl client
/home/amurdaca/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:656
  [k8s.io] Guestbook application
  /home/amurdaca/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:656
    should create and stop a working application [Conformance]
    /home/amurdaca/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/kubectl.go:375
------------------------------
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSJun  7 10:54:12.965: INFO: Running AfterSuite actions on all node
Jun  7 10:54:12.965: INFO: Running AfterSuite actions on node 1

Ran 1 of 603 Specs in 117.032 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 602 Skipped PASS

Ginkgo ran 1 suite in 1m57.331986003s
Test Suite Passed

So, we are at 148/151 :)

Updated first comment :)

attach test now passes! ๐ŸŽ‰ ๐ŸŽ‰ ๐ŸŽ‰

@rajatchopra @dcbw @sameo could you help figuring out the network flakyness? probably some misconfiguration:

[Fail] [k8s.io] DNS [It] should provide DNS for the cluster [Conformance]

the above is never passing for some reason

The test below passes only if we stop firewalld and flush iptables before running tests:

# systemctl stop firewalld
# iptables -F

[Fail] [k8s.io] PreStop [It] should call prestop when killing a pod [Conformance]
/home/amurdaca/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/pre_stop.go:174

@mrunalp Can we close this issue? I think all tests are passed now

let's leave this open till we figure the DNS stuff out

FYI I got all tests running in the CI :) so i've updated the title and I'm just waiting for #631 to be merged :)

wking commented

FYI I got all tests running in the CI :) so i've updated the title and I'm just waiting for #631 to be merged :)

#631 was merged last September. I'm not sure if we want to leave it open as some sort of tracker issue? Personally, I prefer issue labels for that sort of thing.

I am going to close this issue, Since it seems to be fixed.