Irqbalance/irqbalance

irq's load will be wrong when someone modify irq's smp_affinity

Closed this issue · 4 comments

When someone change irq's smp_affinity from CPU A to CPU B, irqbalance will still calculate irq's load on CPU A. So irq's load will be wrong and irq's smp_affinity may not change back to the CPU A (the result calculated by irqbalance).

For example:
Package 0: numa_node 0 cpu mask is 00000001 (load 10000000)
Cache domain 0: numa_node is 0 cpu mask is 00000001 (load 10000000)
CPU number 0 numa_node is 0 (load 10000000)
Interrupt 29 node_num is -1 (ethernet/5000000:50)
Interrupt 30 node_num is -1 (ethernet/5000000:50)
Package 1: numa_node 0 cpu mask is 00000002 (load 10000000)
Cache domain 1: numa_node is 0 cpu mask is 00000002 (load 10000000)
CPU number 1 numa_node is 0 (load 10000000)
Interrupt 31 node_num is -1 (ethernet/10000000:100)

Then someone change interrupt 30's smp_affinity to CPU 1

Package 0: numa_node 0 cpu mask is 00000001 (load 5000000)
Cache domain 0: numa_node is 0 cpu mask is 00000001 (load 5000000)
CPU number 0 numa_node is 0 (load 5000000)
Interrupt 29 node_num is -1 (ethernet/2500000:50)
Interrupt 30 node_num is -1 (ethernet/2500000:50)
Package 1: numa_node 0 cpu mask is 00000002 (load 15000000)
Cache domain 1: numa_node is 0 cpu mask is 00000002 (load 15000000)
CPU number 1 numa_node is 0 (load 15000000)
Interrupt 31 node_num is -1 (ethernet/15000000:100)

Interrupt 30 is actually triggered on CPU 1 and its load is calculated into interrupt 31's load. And it won't be rebalanced to CPU 0.

I think remove the the judgment on info->moved in the function activate_mapping is helpful. Irqbalance will override the smp_affinity of all interruptions to prevent them from being modified by someone.

Why would you consider this to be a valid use case? If you're running irqbalance and.balancing irq a, you shouldn't be manually changing irq A's affinity. Garbage in, garbage out

Users may not know irqbalance and may modify the interrupt affinity based on service requirements, or some drivers may modify the interrupt affinity.

Lack of awareness or understanding is not a issue we can work around. If users do something dumb, things break, that's not on us to fix