Only NFQUEUE a nftables set of IPs matching the selectors?
vaskozl opened this issue · 3 comments
There are also some performance improvements that can be applied, such as to restrict the packets that are sent to userspace to the ones that have network policies only. This effectively means that network policies are applied ONLY at the time the connection is initatied by whatever the conntrack kernel understand by NEW connection.
I found this comment quite interesting as I was interested in an nftables
network policies implementation. We could this with by creating a a set:
for _, ndp := range podSelectorEndpoints {
tx.Add(&knftables.Element{
Set: netpoledPodsSetName,
Key: []string{string(ndp.PodIP)},
})
}
Then we can use two queues, one for when the PodIP is the source and one when it's the destination, so we only need to evaluate egress and ingress respectively.
Something like:
tx.Add(&knftables.Rule{
Chain: filterInputChain,
Rule: knftables.Concat(
ipX, "saddr", "@", netpoledPodsSetName, "queue", "num", fmt.Sprintf(c.config.IngressQueueID),
),
Comment: ptr.To("Evaluate ingress network policies"),
})
This partially solve the double processing in #10, assuming the @netpoledPodsSet
set only includes pods currently running on the node. (We'd still process in userspace twice but we'd do only half of the checks).
While this seems good in theory to me, I can't help but wonder whether the added complexity here is worth it, because now we have to maintain an up to date set of on each node.
We could actually maintain a set per network policy (instead of one big one). If we do all that, then it's not much of a strech to also maintain an ingress and egress nftables set per netpol each contain addr/protocol and port(range)?
Something like:
tx.Add(&knftables.Rule{
Chain: ingressEnabledChain,
Rule: knftables.Concat(
ipX, "saddr", ".", "dport", "@", ingressSetName, "ip", "daddr", "@", netpoledPodsSetName, "accept"
),
Comment: ptr.To("Allow defined ingress"),
})
(We'd actually also need a noPort
ingress/egress set and rule, as inet_service must be a port or range)
TLDR: If we bother to maintain one set, should we not maintain 5 and do everything in nftables
(future features aside)?
those were my thoughts when I added the comment, and I kind of like the simplicity of existing solution that just needs one nftables rule ... I think that is a matter of taking numbers at this point
I'm also thinking the podSelector optimisation alone probably won't realy provide a significant saving assuming blanket BANP or otherwise policies covering the majority of workloads, while it might instead introduce extra load/bugs.