Warning "Detected conflicting tunnel peer for prefix" and "Detected conflicting encryption key index for prefix"
Opened this issue · 2 comments
Hi there!
We've deployed Cilium and Netreap following the guide README.md to our DEV cluster.
After a month of running-in and testing in the DEV cluster, we decided to implement it in the PROD cluster.
Testing was successful on DEV cluster, but one thing stopped us from finally switching to Cilium.
On the PROD cluster we have more than 300 hosts.
Sometimes we're getting an threatening warning in the cilium log:
level=warning msg="Detected conflicting tunnel peer for prefix. This may cause connectivity issues for this address." cidr=172.16.42.41/32 conflictingResource=node//host2 conflictingTunnelPeer=ip-addr resource=node//host2 subsys=ipcache
level=warning msg="Detected conflicting encryption key index for prefix. This may cause connectivity issues for this address." cidr=172.16.42.41/32 conflictingKey=255 conflictingResource=node//host2 key=255 resource=node//host1 subsys=ipcache
Our configuration below:
Systemd Unit File
[Unit]
Description=Cilium Agent
After=docker.service
Requires=docker.service
After=consul.service
Wants=consul.service
Before=nomad.service
[Service]
Restart=always
ExecStartPre=-/usr/bin/docker exec %n stop
ExecStartPre=-/usr/bin/docker rm %n
ExecStart=/usr/bin/docker run --rm --name %n \
-v /var/run/cilium:/var/run/cilium \
-v /sys/fs/bpf:/sys/fs/bpf \
--env CONSUL_HTTP_TOKEN=<secret_token> \
--net=host \
--cap-add NET_ADMIN \
--cap-add NET_RAW \
--cap-add IPC_LOCK \
--cap-add SYS_MODULE \
--cap-add SYS_ADMIN \
--cap-add SYS_RESOURCE \
--privileged \
cilium/cilium:v1.14.5 \
cilium-agent --kvstore consul --kvstore-opt consul.address=127.0.0.1:8500 \
--kvstore-periodic-sync=5m \
--enable-ipv6=false \
--tunnel-protocol=geneve \
--enable-wireguard \
--encrypt-node \
--enable-bpf-masquerade=true \
--kube-proxy-replacement=true \
--enable-l7-proxy=false \
--prometheus-serve-addr=127.0.0.1:9962 \
--ipv4-range 172.16.0.0/16 \
[Install]
WantedBy=multi-user.target
/etc/docker/daemon.json
{
"default-address-pools": [
{
"base": "172.17.0.0/16",
"size": 24
}
]
}
/opt/cni/config/cilium.conflist
{
"name": "cilium",
"cniVersion": "1.0.0",
"plugins": [
{
"type": "cilium-cni",
"enable-debug": false
},
{
"type": "portmap",
"capabilities": {"portMappings": true}
}
]
}
/opt/cni/bin
bandwidth bridge cilium-cni dhcp dummy firewall host-device host-local ipvlan loopback macvlan portmap ptp sbr static tap tuning vlan vrf
Netreap system job
job "netreap" {
datacenters = ["dc1"]
priority = "100"
type = "system"
meta {
RENDER_STAMP = "2024-05-01_02:48:41PM"
}
constraint {
attribute = "${attr.plugins.cni.version.cilium-cni}"
operator = ">="
value = "1.15.3"
}
constraint {
attribute = "${attr.plugins.cni.version.cilium-cni}"
operator = "is_set"
}
group "netreap" {
vault {
policies = ["nomad-services"]
}
restart {
interval = "10m"
attempts = 5
delay = "15s"
mode = "delay"
}
service {
name = "netreap"
tags = [ "netreap" ]
}
task "netreap" {
driver = "docker"
template {
destination = "secrets/file.env"
env = true
change_mode = "restart"
data = <<EOT
NETREAP_CILIUM_CIDR="172.16.0.0/16"
NOMAD_ADDR="https://127.0.0.1:4646"
NETREAP_DEBUG="true"
NOMAD_CLIENT_KEY="/etc/nomad/ssl/client-key.pem"
NOMAD_CLIENT_CERT="/etc/nomad/ssl/client.pem"
NOMAD_CAPATH="/etc/nomad/ssl/nomad-ca.pem"
{{- with secret "kv/netreap/prod" }}
CONSUL_HTTP_TOKEN="{{ .Data.data.CONSUL_HTTP_TOKEN }}"
{{- end }}
{{- with secret "kv/netreap/prod" }}
NOMAD_TOKEN="{{ .Data.data.NOMAD_TOKEN }}"
{{- end }}
EOT
}
config {
image = "registry"
network_mode = "host"
auth {
username = "[MASKED]"
password = "[MASKED]"
}
volumes = [
"/etc/nomad/ssl:/etc/nomad/ssl",
"/var/run/cilium:/var/run/cilium",
]
}
resources {
cpu = 200
memory = 300
}
}
}
}
Netreap Version
v0.2.0
Cilium Version
Client: 1.14.5 85db28be 2023-12-11T14:30:29+01:00 go version go1.20.12 linux/amd64
Daemon: 1.14.5 85db28be 2023-12-11T14:30:29+01:00 go version go1.20.12 linux/amd64
Kernel Version
Linux ax51-host110 5.10.0-19-amd64 #1 SMP Debian 5.10.149-2 (2022-10-21) x86_64 GNU/Linux
Nomad Version
Nomad v1.5.6
BuildDate 2023-05-19T18:26:13Z
Revision 8af70885c02ab921dedbdf6bc406a1e886866f80
Consul Version
Consul v1.14.7
Revision d97acc0a
Build Date 2023-05-16T01:36:41Z
When we're running cilium-agent
with --ipv4-range 172.16.0.0/16
as you specified in the documentation any host has same subnet - 172.16.0.0/16
host1 ext ip addr 172.16.0.0/16 local
host2 ext ip addr 172.16.0.0/16 kvstore
host3 ext ip addr 172.16.0.0/16 kvstore
host4 ext ip addr 172.16.0.0/16 kvstore
host5 ext ip addr 172.16.0.0/16 kvstore
host6 ext ip addr 172.16.0.0/16 kvstore
host7 ext ip addr 172.16.0.0/16 kvstore
host8 ext ip addr 172.16.0.0/16 kvstore
host9 ext ip addr 172.16.0.0/16 kvstore
host10 ext ip addr 172.16.0.0/16 kvstore
I guess this may be the cause of conflicts and as a result we see it in the cilium log.
And if I understand correctly, Netreap is not responsible for IPAM, as the operator does in K8s.
Can you explain me please how should it be working properly?
Anyway, maybe you have some advices for production-ready cluster. It would be great to hear your opinion on this.
Also we ran cilium-agent
with --ipv4-range auto
flag, but this subnet range is not enough for us.
host1 ext ip addr 10.231.0.0/16 local
host2 ext ip addr 10.72.0.0/16 kvstore
host3 ext ip addr 10.201.0.0/16 kvstore
host4 ext ip addr 10.70.0.0/16 kvstore
host5 ext ip addr 10.75.0.0/16 kvstore
host6 ext ip addr 10.104.0.0/16 kvstore
host7 ext ip addr 10.109.0.0/16 kvstore
host8 ext ip addr 10.154.0.0/16 kvstore
host9 ext ip addr 10.208.0.0/16 kvstore
host10 ext ip addr 10.23.0.0/16 kvstore
And I also noticed that you have removed configuring flag --cilium-cidr
in the new version.
What is the reason for this?
--cilium-cidr
is no longer needed as netreap now validates node membership by querying Nomad directly rather than guessing based on the IP address.
As for the issue with conflicting IPs I suspect that's something more to do with the Cilium configuration and it seems like you're having more luck asking there cilium/cilium#32188