Log to standard output result in garbage character in add function
ozaditya opened this issue · 6 comments
What happened:
After setting environment variable AWS_VPC_K8S_PLUGIN_LOG_FILE to stdout and deploying CNI version 1.7.3.
I have observed that CFN fails to add ip to pods.
Logs from CNI:
Send AddNetworkReply: IPv4Addr 10.1.161.151, DeviceNumber: 0, err: \u003cnil\u003e"}
Received DelNetwork for Sandbox 4da84e08dd17a4e2cceacbf31d04801575cfab2e19fae462a17c95921b422ee5"}
DelNetworkRequest: ClientVersion:\"v1.7.3\" K8S_POD_NAME:\"datadog-agent-wt4hd\" K8S_POD_NAMESPACE:\"datadog\" K8S_POD_INFRA_CONTAINER_ID:\"4da84e08dd17a4e2cceacbf31d04801575cfab2e19fae462a17c95921b422ee5\" Reason:\"PodDeleted\" ContainerID:\"4da84e08dd17a4e2cceacbf31d04801575cfab2e19fae462a17c95921b422ee5\"
Logs from kubelet:
Error adding datadog_datadog-agent-wt4hd/4da84e08dd17a4e2cceacbf31d04801575cfab2e19fae462a17c95921b422ee5 to network aws-cni/aws-cni: invalid character '{' after top-level value
RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to set up sandbox container "4da84e08dd17a4e2cceacbf31d04801575cfab2e19fae462a17c95921b422ee5" network for pod "datadog-agent-wt4hd": NetworkPlugin cni failed to set up pod "datadog-agent-wt4hd_datadog" network: invalid character '{' after top-level value
What you expected to happen:
Since its documented in Read Me that CNI log can be sent on stdout. I expect CNI to function properly.
snippet:
AWS_VPC_K8S_CNI_LOG_FILE
Type: String
Default: /host/var/log/aws-routed-eni/ipamd.log
Valid Values: stdout or a file path
Specifies where to write the logging output of ipamd. Either to stdout or to override the default file (i.e., /var/log/aws-routed-eni/ipamd.log).
How to reproduce it (as minimally and precisely as possible):
Set above environment variable to stdout or simply change cni config file present at path “/etc/cni/net.d” with name “10-aws.conflist” to below.
{
"cniVersion": "0.3.1",
"name": "aws-cni",
"plugins": [
{
"name": "aws-cni",
"type": "aws-cni",
"vethPrefix": "eni",
"mtu": "9001",
"pluginLogFile": "stdout",
"pluginLogLevel": "Debug"
},
{
"type": "portmap",
"capabilities": {"portMappings": true},
"snat": true
}
]
}
Anything else we need to know?:
I believe error was generated by add function.
amazon-vpc-cni-k8s/pkg/ipamd/rpc_handler.go
Line 159 in e2e1900
I believe it trace back to project https://github.com/containernetworking/cni/
Environment:
- Kubernetes version (use
kubectl version
): Currrently tested on version 1.15 - CNI Version: 1.7.x same is true for 1.6.x
- OS (e.g:
cat /etc/os-release
): Amazon linux 2 - Kernel (e.g.
uname -a
): Linux ip-172-31-45-225.ec2.internal 4.14.171-136.231.amzn2.x86_64 #1 SMP Thu Feb 27 20:22:48 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
We did a initial code walk through, pending root cause. Currently we are seeing the below log in kubelet -
E1005 18:22:11.154374 11718 cni.go:364] Error adding default_my-nginx-86b7cfc89-
kb8d5/cc40109840c171aced733ee1a15b20a403106c301ceca55d4a28792eebd3e7b6 to network aws-cni/aws-cni: netplugin
failed but error parsing its diagnostic message "{\"level\":\"info\",\"ts\":\"2020-10-05T18:22:11.145Z\",\"caller\":\"routed-eni-cni-plugin/cni.go:111\",\"msg\":\"Received CNI add request:
ContainerID(cc40109840c171aced733ee1a15b20a403106c301ceca55d4a28792eebd3e7b6) Netns(/proc/12295/ns/net)
IfName(eth0) Args(IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=my-nginx-86b7cfc89-
kb8d5;K8S_POD_INFRA_CONTAINER_ID=cc40109840c171aced733ee1a15b20a403106c301ceca55d4a28792eebd3e7b6)
Path(/opt/cni/bin) argsStdinData({\\\"cniVersion\\\":\\\"0.3.1\\\",\\\"mtu\\\":\\\"9001\\\",\\\"name\\\":\\\"aws-
cni\\\",\\\"pluginLogFile\\\":\\\"stdout\\\",\\\"pluginLogLevel\\\":\\\"trace\\\",\\\"type\\\":\\\"aws-
cni\\\",\\\"vethPrefix\\\":\\\"eni\\\"})\"}\n{\"level\":\"debug\",\"ts\":\"2020-10-05T18:22:11.145Z\",\"caller\":\"routed-eni-cni-
plugin/cni.go:111\",\"msg\":\"MTU value set is 9001:\"}\n{\"level\":\"info\",\"ts\":\"2020-10-
05T18:22:11.148Z\",\"caller\":\"routed-eni-cni-plugin/cni.go:111\",\"msg\":\"Received add network response for container
cc40109840c171aced733ee1a15b20a403106c301ceca55d4a28792eebd3e7b6 interface eth0: Success:true
DeviceNumber:-1 VPCcidrs:\\\"192.168.0.0/16\\\" \"}\n{\"level\":\"debug\",\"ts\":\"2020-10-
05T18:22:11.148Z\",\"caller\":\"routed-eni-cni-plugin/cni.go:188\",\"msg\":\"SetupNS: hostVethName=enid9fae0540d6,
contVethName=eth0, netnsPath=/proc/12295/ns/net, deviceNumber=-1, mtu=9001\"}\n{\"level\":\"error\",\"ts\":\"2020-10-
05T18:22:11.149Z\",\"caller\":\"driver/driver.go:185\",\"msg\":\"Failed to setup veth network setup NS network: failed to add IP
addr to \\\"eth0\\\": numerical result out of range\"}\n{\"level\":\"error\",\"ts\":\"2020-10-
05T18:22:11.149Z\",\"caller\":\"routed-eni-cni-plugin/cni.go:111\",\"msg\":\"Failed SetupPodNetwork for container
cc40109840c171aced733ee1a15b20a403106c301ceca55d4a28792eebd3e7b6: setupNS network: failed to setup veth pair.:
setupVeth network: failed to setup veth network: setup NS network: failed to add IP addr to \\\"eth0\\\": numerical result out of range\"}\n{\"level\":\"error\",\"ts\":\"2020-10-05T18:22:11.149Z\",\"caller\":\"routed-eni-cni-
plugin/cni.go:111\",\"msg\":\"Error received from DelNetwork grpc call for container
cc40109840c171aced733ee1a15b20a403106c301ceca55d4a28792eebd3e7b6: rpc error: code = Unknown desc =
datastore: unknown pod\"}\n": invalid character '{' after top-level value
Error is at addToNetwork
[https://github.com/kubernetes/kubernetes/blob/v1.15.11/pkg/kubelet/dockershim/network/cni/cni.go#L331]
Call trace-
addToNetwork
-> AddNetworkList
-> addNetwork
[1] -> ExecPluginWithResult
[2]
[1] https://github.com/containernetworking/cni/blob/2d4bc3d57aae747b6d68df7c3631bc9c120dfcde/libcni/api.go#L429
[2] https://github.com/containernetworking/cni/blob/2d4bc3d57aae747b6d68df7c3631bc9c120dfcde/libcni/api.go#L421
Need to add more tracing and check around this version.NewResult(confVersion, stdoutBytes)
-
https://github.com/containernetworking/cni/blob/f145359defec1db69929c67dead380fdf8dfce1b/pkg/invoke/exec.go#L93
and exec.ExecPlugin
- https://github.com/containernetworking/cni/blob/f145359defec1db69929c67dead380fdf8dfce1b/pkg/invoke/exec.go#L81
ExecPlugin
on c.Run()
success -[https://github.com/containernetworking/cni/blob/77436456f2ab8443ba0450d61b1499bd50e8a704/pkg/invoke/raw_exec.go#L34] -> return stdout.Bytes()
Also based on the spec - https://github.com/containernetworking/cni/blob/master/SPEC.md#result recommendation is to use "stderr can be used for unstructured output such as logs". Verified "stderr" instead of "stdout" and issue is not seen.
For completeness, Call trace from CNI -
add
[1]-> PrintResult
[2] -> prettyPrint
[3] -> _, err = os.Stdout.Write(data)
[o/p is written to STDOUT]
[1] https://github.com/containernetworking/cni/blob/f145359defec1db69929c67dead380fdf8dfce1b/pkg/invoke/exec.go#L81
[2] https://github.com/containernetworking/cni/blob/master/pkg/types/types.go#L105
[3] https://github.com/containernetworking/cni/blob/master/pkg/types/types.go#L200
I did some further testing last night to see if it is possible set stdout
value for AWS_VPC_K8S_PLUGIN_LOG_FILE
env var and found that it is not possible to do it today.
As @jayanthvn mentioned above, we use skel.PluginMainWithError
method which will redirect the result to stdout
in json format. Apart from result which is being returned back by cmdAdd
func, when PLUGIN log is set to stdout, we write some addition log information about CNI request and routes information which uses zapcore.Lock(os.Stdout)
to write to os.Stdout which is messing up the json structure. This is causing kubelet to consider this as failure of pod creation and retry even though this is not an actual error.
That said, I am trying to understand the use-case here. CNI plugin is a binary on the host which is invoked by kubelet whenever we want to add/del a pod. If we are writing PLUGIN logs to stdout
, we will not be able to retrieve this logs if there are any failure as the process will write these logs to /proc/self/fd/1
which cannot be read.
IMO, we should make a change to documentation stating that we do not support stdout
logging for AWS_VPC_K8S_PLUGIN_LOG_FILE
env var.
WDYT, @jayanthvn @SaranBalaji90
This bug hit me and has been confusing me all week. I didn't believe in the Quantum Observer Effect until today!
I do think that having the debug logs to to stdout would be a massively useful feature though if it could be made to work.
@wtistio Thanks, will explore the options to see if it is doable for CNI plugin.