k8s e2e tests [149/149]

Question

k8s e2e tests [149/149]

runcom opened this issue 8 years ago · 34 comments

~~Not that bad for a very first run of k8s e2e tests: 117/151 😸~~

Full logs available at:

~~https://runcom.red/e2e.log (117/151)~~
~~https://runcom.red/e2e-1.log (139/151)~~
~~144/151~~
~~148/151~~
149/151

test yourself with:

# kube commit 2899f47bc

$ sudo setenforce 0

# either this way or start crio.service
$ cd $GOPATH/src/github.com/kubernetes-incubator/cri-o && \
  sudo ./crio --cgroup-manager=systemd --log debug.log --debug --runtime \
  $GOPATH/src/github.com/opencontainers/runc/runc --conmon $PWD/conmon/conmon \
  --seccomp-profile=$PWD/seccomp.json

$ sudo PATH=$GOPATH/src/k8s.io/kubernetes/third_party/etcd:${PATH} \
  PATH=$PATH GOPATH=$GOPATH \
  ALLOW_PRIVILEGED=1 \
  CONTAINER_RUNTIME=remote CONTAINER_RUNTIME_ENDPOINT='/var/run/crio.sock \
  --runtime-request-timeout=5m' ALLOW_SECURITY_CONTEXT="," \
  DNS_SERVER_IP="192.168.1.5" API_HOST="192.168.1.5" \
  API_HOST_IP="192.168.1.5" KUBE_ENABLE_CLUSTER_DNS=true ./hack/local-up-cluster.sh

# on Fedora
$ sudo systemctl stop firewalld
$ sudo iptables -F

$ KUBERNETES_PROVIDER=local KUBECONFIG=/var/run/kubernetes/admin.kubeconfig \
  go run hack/e2e.go -v --test -test_args="-host=https://localhost:6443 --ginkgo.focus=\[Conformance\]" \
  | tee e2e.log

# enjoy

Summarizing 2 Failures:

[Fail] [k8s.io] Projected [It] should project all components that make up the projection API [Conformance] [Volume] [Projection]
/home/amurdaca/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/framework/util.go:2213

[Fail] [k8s.io] DNS [It] should provide DNS for the cluster [Conformance]
/home/amurdaca/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/dns.go:223

Ran 151 of 603 Specs in 5721.093 seconds
FAIL! -- 149 Passed | 2 Failed | 0 Pending | 452 Skipped --- FAIL: TestE2E (5721.12s)
FAIL

Ginkgo ran 1 suite in 1h35m21.34754297s
Test Suite Failed

Answer 1 · 2017-05-21T03:32:50.000Z

Nice - next target is getting these to 0 failures :)

Answer 2 · 2017-05-22T10:35:13.000Z

alright, trimming down the list above a lot :) many of the failures is just beacuse local-up-cluster.sh runs with SecurityContextDeny. I'll re-run the test and post an updated list :)

(also updated first comment for a way to run e2e for anyone to try out)

Answer 3 · 2017-05-22T12:11:36.000Z

FYI I figured the following test:

[Fail] [k8s.io] EmptyDir volumes [It] volume on default medium should have the correct mode [Conformance] [Volume]
/home/amurdaca/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/framework/pods.go:76

will be fixed by using ALLOW_PRIVILEGED=1 in local-up-cluster.sh or, if you run k8s directly, by adding --allow-privileged to kubelet.

Not sure that fixes other tests here, I'll re-run them once I finish the run I started in my previous comment :)

Answer 4 · 2017-05-22T13:20:17.000Z

first comment updated with new result, 137/151, going to re-run the suite as per my previous comment.

Answer 5 · 2017-05-22T13:51:24.000Z

diff --git a/server/container_create.go b/server/container_create.go
index c15985e..5cbab69 100644
--- a/server/container_create.go
+++ b/server/container_create.go
@@ -589,6 +589,14 @@ func (s *Server) createSandboxContainer(ctx context.Context, containerID string,
 
 	containerImageConfig := containerInfo.Config
 
+	// TODO: volume handling in CRI-O
+	//       right now, we do just mount tmpfs in order to have images like
+	//       gcr.io/k8s-testimages/redis:e2e to work with CRI-O
+	for dest := range containerImageConfig.Config.Volumes {
+		destOptions := []string{"mode=1777", "size=" + strconv.Itoa(64*1024*1024), label.FormatMountLabel("", sb.mountLabel)}
+		specgen.AddTmpfsMount(dest, destOptions)
+	}
+
 	processArgs, err := buildOCIProcessArgs(containerConfig, containerImageConfig)
 	if err != nil {
 		return nil, err

The patch above makes many other tests pass for tests with images that defines a Config.Volumes map. Can be used as a workaround for now till we have correct volume handling in CRI-O. I'll make a PR for it shortly cause there's no reason for now to have those images not working. The tmpfs stuff is an hack, yes, still useful for now.

Answer 6 · 2017-05-22T14:13:48.000Z

I think we can add this tmpfs volume as a temporary fix. The downside is more RAM usage for containers with VOLUMEs so would want to move to disk-backed cri-o managed volumes.

Answer 7 · 2017-05-22T14:35:21.000Z

I think we can add this tmpfs volume as a temporary fix. The downside is more RAM usage for containers with VOLUMEs so would want to move to disk-backed cri-o managed volumes.

I'll test with that patch and see how it goes. Will report status here

Answer 8 · 2017-05-22T17:50:14.000Z

for the record, I'm running e2e on Fedora 25 and all network tests currently failing seem to be resolved by just sudo iptables -F before running tests again - no clue why but it's something documented

Answer 9 · 2017-05-22T17:52:58.000Z

@runcom iptables -F removes all the rules so not surprising :) We do want to fix it the right way by just adding the right rules. I wouldn't advise running iptables -F outside of a test VM.

Answer 10 · 2017-05-22T17:56:41.000Z

139/151, updated first comment and logs available at https://runcom.red/e2e-1.log

Answer 11 · 2017-05-22T18:06:23.000Z

One of them is the projected failure which is really a bug in kube so one less to worry about ;)

Answer 12 · 2017-05-22T18:18:59.000Z

1st and 2nd test used to pass with node-e2e though (and also if you look at the first run of e2e tests I did https://runcom.red/e2e.log you can see them passing), we need to understand why they're failing now.

Answer 13 · 2017-05-28T11:05:07.000Z

[Fail] [k8s.io] PreStop [It] should call prestop when killing a pod [Conformance]
/home/amurdaca/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/pre_stop.go:174

Fixed by #537

Answer 14 · 2017-05-28T15:09:16.000Z

Nice!

…

On May 28, 2017, at 4:33 AM, Antonio Murdaca ***@***.***> wrote: [Fail] [k8s.io] Kubectl client [k8s.io] Kubectl run rc [It] should create an rc from an image [Conformance] /home/amurdaca/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/kubectl.go:1165 [Fail] [k8s.io] Kubectl client [k8s.io] Kubectl rolling-update [It] should support rolling-update to same image [Conformance] /home/amurdaca/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/kubectl.go:177 fixed as well — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

Answer 15 · 2017-05-28T15:31:33.000Z

Port forwarding tests panic, fix is here #542

though, it seems our nsenter/socat implementation isn't correctly working (even if it's copy/pasted from the dockershim one, I verified this by running the same tests with docker, and they pass with it, but fail with CRI-O for some weird, network, issue I guess).

Answer 16 · 2017-05-28T16:18:59.000Z

I suspect that it might be related to iptables. We might need a rule or something may be blocking it. Can you run the tests disabling firewalls to rule out the second scenario?

…

On May 28, 2017, at 8:31 AM, Antonio Murdaca ***@***.***> wrote: Port forwarding tests panic, fix is here #542 though, it seems our nsenter/socat implementation isn't correctly working (even if it's copy/pasted from the dockershim one, I verified this by running the same tests with docker, and they pass with it, but fail with CRI-O for some weird, network, issue I guess). — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

Answer 17 · 2017-05-28T16:51:42.000Z

found the root cause, it's actually a bug in CRI-O itself, port forwarding tests now passes with #542 + a fix coming in a moment 😎

Answer 18 · 2017-05-28T17:03:42.000Z

All port forwarding tests fixed by #542 and #543 - @mrunalp PTAL at those :)

Answer 19 · 2017-05-28T17:07:41.000Z

I'll re-run the whole e2e once those 2 PRs are merged.

note that these 4 pass also:

[Fail] [k8s.io] Kubectl client [k8s.io] Kubectl run rc [It] should create an rc from an image [Conformance]
/home/amurdaca/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/kubectl.go:1165

[Fail] [k8s.io] Kubectl client [k8s.io] Kubectl rolling-update [It] should support rolling-update to same image [Conformance]
/home/amurdaca/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/kubectl.go:177

[Fail] [k8s.io] Probing container [It] should *not* be restarted with a /healthz http liveness probe [Conformance]
/home/amurdaca/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/common/container_probe.go:404

[Fail] [k8s.io] KubeletManagedEtcHosts [It] should test kubelet managed /etc/hosts file [Conformance]
/home/amurdaca/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/framework/exec_util.go:107

Answer 20 · 2017-05-29T10:16:29.000Z

we're now at 144/151 🎉 , I've updated the result in the first comment so look there for failing tests. The remaining failing tests are either network issues (probably because of my laptop) or tests that relies on attach (but @mrunalp is working on it 👍)

Answer 21 · 2017-05-29T11:17:50.000Z

@sameo @mrunalp this test works fine only after disabling firewalld and flushing iptables (and also applying #544 to master), otherwise it always fail (maybe more net tests are also blocked by firewalld)

systemctl stop firewalld
iptables -F

[Fail] [k8s.io] PreStop [It] should call prestop when killing a pod [Conformance]
/home/amurdaca/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/pre_stop.go:174

Answer 22 · 2017-05-29T18:23:03.000Z

So 3 of the 7 failures are flaky or misconfigurations, we need to tackle just 4 then, that are all related to attach :)

Answer 23 · 2017-05-29T18:55:15.000Z

Awesome 👍😀

…

On May 29, 2017, at 11:23 AM, Antonio Murdaca ***@***.***> wrote: So 3 of the 7 failures are flaky or misconfigurations, we need to tackle just 4 then, that are all related to attach :) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Answer 24 · 2017-06-05T10:58:30.000Z

Turns out this test:

[Fail] [k8s.io] Kubectl client [k8s.io] Guestbook application [It] should create and stop a working application [Conformance]
/home/amurdaca/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/kubectl.go:1718

is failing just because we run the tests w/o a DNS service when doing local-up-cluster, one way to fix it when running the test is to follow this https://github.com/linzichang/kubernetes/blob/master/examples/guestbook/README.md#finding-a-service

(in the CI, we'll likely switch to env DNS as pointed out in that readme)

Answer 25 · 2017-06-05T14:57:17.000Z

Yeah, we could switch to env for that test 👍

Answer 26 · 2017-06-05T19:30:36.000Z

That test is basically failing because hack/local-up-cluster.sh is docker specific I guess https://github.com/kubernetes/kubernetes/blob/master/hack/local-up-cluster.sh#L52-L58

Answer 27 · 2017-06-07T08:55:18.000Z

So the only test failing now is the attach one :) I just confirmed the following test works fine (it was a kubelet misconfiguration wrt to kube-dns). We need to enable kube-dns in local-up-cluster to test it, like this:

$ sudo PATH=$GOPATH/src/k8s.io/kubernetes/third_party/etcd:${PATH} \    
  PATH=$PATH GOPATH=$GOPATH \
  ALLOW_PRIVILEGED=1 \
  CONTAINER_RUNTIME=remote CONTAINER_RUNTIME_ENDPOINT='/var/run/crio.sock \
  --runtime-request-timeout=5m' ALLOW_SECURITY_CONTEXT="," DNS_SERVER_IP="192.168.1.5" API_HOST="192.168.1.5" API_HOST_IP="192.168.1.5" KUBE_ENABLE_CLUSTER_DNS=true ./hack/local-up-cluster.sh

After setting KUBE_ENABLE_CLUSTER_DNS=true and setting API_HOST, API_HOST_IP and DNS_SERVER_IP to the vm ip address, the test passes:

• [SLOW TEST:116.981 seconds]
[k8s.io] Kubectl client
/home/amurdaca/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:656
  [k8s.io] Guestbook application
  /home/amurdaca/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:656
    should create and stop a working application [Conformance]
    /home/amurdaca/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/kubectl.go:375
------------------------------
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSJun  7 10:54:12.965: INFO: Running AfterSuite actions on all node
Jun  7 10:54:12.965: INFO: Running AfterSuite actions on node 1

Ran 1 of 603 Specs in 117.032 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 602 Skipped PASS

Ginkgo ran 1 suite in 1m57.331986003s
Test Suite Passed

So, we are at 148/151 :)

Answer 28 · 2017-06-07T08:59:47.000Z

Updated first comment :)

Answer 29 · 2017-06-14T15:26:09.000Z

attach test now passes! 🎉 🎉 🎉

@rajatchopra @dcbw @sameo could you help figuring out the network flakyness? probably some misconfiguration:

[Fail] [k8s.io] DNS [It] should provide DNS for the cluster [Conformance]

the above is never passing for some reason

The test below passes only if we stop firewalld and flush iptables before running tests:

# systemctl stop firewalld
# iptables -F

[Fail] [k8s.io] PreStop [It] should call prestop when killing a pod [Conformance]
/home/amurdaca/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/pre_stop.go:174

Answer 30 · 2017-07-11T14:06:27.000Z

@mrunalp Can we close this issue? I think all tests are passed now

Answer 31 · 2017-07-11T14:07:27.000Z

let's leave this open till we figure the DNS stuff out

Answer 32 · 2017-09-08T16:29:56.000Z

FYI I got all tests running in the CI :) so i've updated the title and I'm just waiting for #631 to be merged :)

Answer 33 · 2018-02-23T20:37:58.000Z

FYI I got all tests running in the CI :) so i've updated the title and I'm just waiting for #631 to be merged :)

#631 was merged last September. I'm not sure if we want to leave it open as some sort of tracker issue? Personally, I prefer issue labels for that sort of thing.

Answer 34 · 2018-03-16T18:36:50.000Z

I am going to close this issue, Since it seems to be fixed.