SQM does not always detect cake as available qdisc at boot
Opened this issue · 12 comments
I have lately noticed that SQM does not always properly recognize cake at OpenWrt startup. The list of available qdiscs in LuCI contains then only fq_codel, but no trace of cake. Manually running /usr/lib/sqm/update-available-qdiscs
brings cake to the qdisc list.
Note the time difference below. fq_codel detected at boot, cake later on manual run:
root@router1:~# /usr/lib/sqm/update-available-qdiscs
root@router1:~# ls -l /tmp/run/sqm/available_qdiscs/
-rw-r--r-- 1 root root 0 May 26 20:38 cake
-rw-r--r-- 1 root root 0 May 26 19:10 fq_codel
I have noticed this several times after the reboot after the flashing of new OpenWrt master firmware into my ipq806x/R7800. (I am usually using simple/fq_codel, but I have lately used also cake. So it is hard to say if this is really new, or already something old.)
I suspect that there is some race condition or similar in the SQM startup process, so that it misses detecting cake.
I haven't tried to debug it further, but thought to mention it case somebody else runs into that. Could be that a small delay in the detection script would help, or something similar.
The system log shows nothing interesting by default:
root@router1:~# logread | grep sqm
Tue May 26 19:10:40 2020 daemon.notice procd: /etc/rc.d/S50sqm: SQM: Starting SQM script: simple.qos on eth0.2, in: 190000 Kbps, out: 18000 Kbps
Tue May 26 19:10:40 2020 user.notice SQM: Using generic sqm_start_default function.
Tue May 26 19:10:40 2020 daemon.notice procd: /etc/rc.d/S50sqm: SQM: Using generic sqm_start_default function.
Tue May 26 19:10:41 2020 daemon.notice procd: /etc/rc.d/S50sqm: SQM: simple.qos was started on eth0.2 successfully
Tue May 26 19:10:45 2020 user.notice ucitrack: Setting up /etc/config/firewall reload dependency on /etc/config/sqm
Tue May 26 19:12:26 2020 user.notice SQM: Using generic sqm_start_default function.
Tue May 26 19:12:29 2020 user.notice SQM: Using generic sqm_start_default function.
I added verbosity to the config, so maybe something interesting surfaces on the next times.
I tried normal reboot, and both qdiscs got detected ok.
I suspect that the trouble is somehow related to the first boot after flashing, as the discs & kept settings are initialised, so the process timings may be a bit different from a normal boot.
Are you baking sqm-scripts and kmod-sched-cake into the image?
Sure. They are included in the image. I include luci-app-sqm and it pulls the dependencies.
(ipq806x has kernel 5.4, so it is the native in-tree cake module.)
I tried now re-flashing the router, but this time the detection worked ok at boot. The failure does not happen always. (But there is no trace of cake detection in SQM start log, although the marker file is now present in available_qdiscs folder.)
root@router1:/tmp/run/sqm# ls -l /tmp/run/sqm/available_qdiscs/
-rw-r--r-- 1 root root 0 May 26 21:47 cake
-rw-r--r-- 1 root root 0 May 26 21:47 fq_codel
Hmmm.
While writing this, I noticed that SQM has actually started several times during boot, as interface ifups trigger restarts. Apparently only the log of the last start (22:03) get preserved in SQM's own log but the earlier restart (21:47) gets lost.
I copied the earlier starts from the system log, so it contains also unrelated garbage:
bootlog.txt
Ps. I added the verbosity to the uci config file, not /etc/sqm.conf, but that should make no difference, I think.
Ah, right. Are you baking sqm-scripts and kmod-sched-cake into the image?
Interesting enough I do not have them compiled into the image —both are modules— and I am experiencing same behaviour. Difference is that I am on 19.07.3 with a few cherry-picked patches from master.
Sounds reasonable.
That would be well after the reboot in any case, so the possible system initialisation should be complete.
I can confirm I have this bug On my wrt3200acm after a fresh flash.
possibly related...
edit: false alarm... issue was on my end... @ any script error or hang at .qos does not cleanup /tmp/run/sqm/sqm-run.lock/