kubernetes/ingress-nginx

worker_process unpopulated after upgrade to 1.10.2

CatbirdBasil opened this issue · 8 comments

What happened:

I'm developing a custom Nginx C Module and I rely on worker_processes (ngx_core_conf_t) during module init to allocate some data. I noticed that when upgrading from 1.10.1 to 1.10.2 this value is no longer populated at both config reading stage and module init stage.

Development Guide

What you expected to happen:

Expected worker_processes variable to be populated as it is only filled in by Nginx when reading configuration (either flat number or calculated afterwards when using auto keyword)

ngx_core_conf_t *core_main_conf = (ngx_core_conf_t *) ngx_get_conf(cycle->conf_ctx, ngx_core_module);
int num_of_workers = core_main_conf->worker_processes; // This one is -1 but I expected it to be 4

NGINX Ingress controller version (exec into the pod and run /nginx-ingress-controller --version): 1.10.2

Anything else we need to know:

I actually conducted an investigation of this as it didn't make sense to me at all. And after I found our what causes it .. it's still not making any sense.

The issue is caused by the patches you introduced in version 1.10.2 for Nginx of version 1.25.3.
To be more precise this patch https://github.com/kubernetes/ingress-nginx/blob/main/images/nginx/rootfs/patches/11_nginx-1.25.3-privileged_agent_process.patch
To be even more precise, adding those lines:

diff --git a/src/core/ngx_cycle.h b/src/core/ngx_cycle.h
index c51b7ff..3261f90 100644
--- a/src/core/ngx_cycle.h
+++ b/src/core/ngx_cycle.h
@@ -22,6 +22,9 @@
 #define NGX_DEBUG_POINTS_ABORT  2
 
 
+#define HAVE_PRIVILEGED_PROCESS_PATCH   1
+
+
 typedef struct ngx_shm_zone_s  ngx_shm_zone_t;
 
 typedef ngx_int_t (*ngx_shm_zone_init_pt) (ngx_shm_zone_t *zone, void *data);
@@ -81,6 +84,7 @@ struct ngx_cycle_s {
 typedef struct {
     ngx_flag_t                daemon;
     ngx_flag_t                master;
+    ngx_flag_t                privileged_agent;
 
     ngx_msec_t                timer_resolution;

The most strange part - you have to include both those changes to break worker_processes. Including just one is working fine. And I looked at the ngx_cycle.h after the patch and it looks perfectly fine.

*Yep I run all the patches one by one to see what caused and issue and then isolated the problematic patch and it still falied so I started cutting from the patch to see exact faulty lines.

I have no idea why this causes such behavior as of now. And I know you took those patches from Openresty but for some reason in Openresty everything is fine and working when I use their images with Nginx 1.25.3. Tried applying their patches but I lacked some of their stuff so stopped this part of investigation.

This issue prevents me from allocating memory properly and only way I can overcome it is to guestimate using number of cpu's to support strange configs that don't go for 1 core = 1 worker

P.S Sorry that I skipped all the version infra stuff, excluded it as I was able to reproduce even without using Ingress

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

These patches are not provided by us but by OpenResty, the NGINX flavor we are using for this project. So there's not a lot we can do about this. You could file an issue at their tracker. I think they have good reasons to change these lines, but are maybe not aware of your use case and the impact of their patch to it.

I assume the OpenResty code you're showing already has all the patches applied, right? Because we are applying them to a plain NGINX.

Also it's worth noting that we need to order these patches as OpenResty does not supply them with a order and sadly order matters when applying them.

Even though I doubt this: Do you think there's a possibility we just put them in the wrong order? IIRC we updated NGINX from v1.21 to v1.25 starting from Ingress NGINX v1.10. Therefore we needed to sync the patches with OpenResty and might have changed their order.

Or is the issue you're having only occurring in v1.10.2 onwards, but not in v1.10.1?

Forget this, just re-read your issue description. Hm... I think we need to diff v1.10.1 to v1.10.2 then.

These patches are not provided by us but by OpenResty, the NGINX flavor we are using for this project. So there's not a lot we can do about this. You could file an issue at their tracker. I think they have good reasons to change these lines, but are maybe not aware of your use case and the impact of their patch to it.

I compared the code of OpenResty-1.25.3.2 and found that when the patch added ngx_flag_t privileged_agent; in /src/core/ngx_cycle.h, it was placed in the wrong structure. Here is the detail, but I haven't verified whether fixing this issue will solve the problem.

11_nginx-1.25.3-privileged_agent_process
image

openresty
image

Did you delete your old comment? Because now I'm confused.

Did you delete your old comment? Because now I'm confused.

I suspect that privileged_agent was not correctly placed in ngx_core_conf_t, and while I am familiar with OpenResty, I am not familiar with this method of applying patches. You can confirm this issue. However, in "12_nginx-1.25.3-privileged_agent_process_connections", I saw that when adding privileged_agent_connections, it was correctly set.

image

Hi, thank you for quick responses.

About those last messages. It's patch being weird but the new field is supposed to go into ngx_core_conf_t if we see how it's used
in that patch and others related to privileged worker (I initially thought vice versa, that it's added to ngx_cycle and that's bad but that would fail the compilation)

About order of patches. I had such an idea, Openresty really doesn't order those, maybe I'll try to find some time today to test that somehow.

But this whole things is soo strange. Seeing that resulting files are fine I don't see how this could cause the issue I'm having but I checked it a dozen of times yesterday to see that I'm not hallucinating or anything

Changing the order didn't help but I just had a revelation and can see what happened now. It's both simple and confusing at the same time.

What happened is difference of at which codebases where running Nginx and my module compiled. At initial env with Ingress Nginx I compiled module on vanilla Nginx and Ingress was running with patches. On my local testing env I compiled module with patches but run the Nginx without them.

And what happened is offset of variable worker_processes inside that struct was different in module and Nginx so I basically was querying for some different spot in memory.

That didn't happen for Openresty because I was doing both on Openresty build and not vanilla Nginx.

So the issue is on my side, but partially. If you're patching something moddable you better add new fields to the end of an struct and not the middle. That way those issues wouldn't occur at all.

That is a complaint for Openresty of course. Thank you guys for your attention. Hope this issue will be helpful to anyone in the spot I was in. I was getting back to this issue for 3 weeks now or smth. You can close it.