dell/dell-recovery

Is wake_network necessary in installation stage 2 ?

Closed this issue · 15 comments

Commit 8223cce sleep the network in stage 2 but will wake up at the end.

There is a case that waking up network will cause package be upgraded through network during stage 2.
e.g.
google-chrome-stable installed at early stage 2 and the package generate sources list in /etc/apt/sources.list.d/google-chrome.list.
deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main

The chroot-script 89-upgrade.sh it will execute apt-get update;apt-get upgrade and this script is executed after network waked up. After execute 89-upgrade.sh, the browser package will be upgraded.

Wondering if wake_network is necessary ?

wake_network would be necessary in case a post install script needs to communicate with a factory server. I don't believe this to be the case right now, but it was put in place in case that need arose.

89-upgrade.sh isn't in dell-recovery codebase. I would actually ask why this is behaving as you describe. All upgrades are supposed to be automatically marked during installation process and installed as needed.

We put all packages which need be upgraded in one fish package and execute apt upgrade in 89-upgrade.sh at very late of stage 2.
e.g.
upgrade-20190323_fish1.tar.gz

It is possible to make sure a chroot script is executed before wake_network ?

Hi Mario,
Let's align the expectation of network accessing of stage 2, so that we can base on the expectation to decide the behavior of scripts.

Refer to Commit 8223cce and your comment, does that mean the original expectation in Stage 2 are:

  1. Network accessing is disabled during stage 2 before post install scripts executing.
    • the post install scripts target to scripts under scripts/chroot-scripts/os-post/
    • so, scripts under scripts/chroot-scripts/fish/ is not expected having ability accessing network, right?

If we can make sure the expectation above, then logically even google-chrome-stable injects google-chrome.list, google-chrome will still not be updated by any script under scripts/chroot-scripts/fish/. Then we need to refine dell-recovery to match this expectation.

Or if the network accessibility is expected 'ON' during scripts under scripts/chroot-scripts/fish/, then we need to do something for unexpected source lists for each script under scripts/chroot-scripts/fish/.

I feel we can just ignore what 89-upgrade.sh did if we make sure the environment is expected during scripts under scripts/chroot-scripts/fish/.

We put all packages which need be upgraded in one fish package and execute apt upgrade in 89-upgrade.sh at very late of stage 2.
e.g.
upgrade-20190323_fish1.tar.gz

It is possible to make sure a chroot script is executed before wake_network ?

@cyruslien
I don't understand why this package is structured like it is. If you juts put the thermald package into debs/, it would work entirely without a shell script and we wouldn't be discussing this issue.

@alex-tu-cc
Regarding the expectation with network in stage2, the expectation was to specifically be sleeping during the ubiquity portion of the install and awake during the post install scripts. However other applications modifying /etc/apt/sources.list and running apt update was never something envisioned in post install script environment. If there is a deficiency in the ubiquity marking all upgrades approach that exists today, I would like to understand that.

Let me circle around internally to find out others take on network environment in post install scripts.

packages installation during ubiquity portion happens in 2 cases:

  1. package already be installed in base image and there's new version in /debs.
  2. package has modealias matches some device on target machine.

From Focal OEM image, it covered most of the cases.
But on shipped Bionic OEM image, there're still some packages be installed by scripts/chroot-scripts/fish/ e.g. amdgpu full-stack, Nvidia driver, kernel parameter for oem-fix, TLP customization ... etc.

But even though Focal OEM covered most of the cases, it's still hard to say that we can drop the way installing packages by scripts in scripts/chroot-scripts/fish/ and it's easier to debug package installation by scripts in scripts/chroot-scripts/fish/ (logs are recorded in chroot.log).

So, let's define the environment precisely in this chance of discussion.
How do you think if we add one more scripts folder to cover the scripts that need network. e.g.

  • scripts executed without network : scripts/chroot-scripts/fish/
  • scripts executed with network : scripts/chroot-scripts/fish/with_network

Or special folder for script which is expected to be executed without network:

  • scripts executed with network : scripts/chroot-scripts/fish/
  • scripts executed without network : scripts/chroot-scripts/fish/without_network

Had some internal discussion and there is no factory need for network at any time. We agree that removing wake_network is an appropriate solution.

debug package installation by scripts in scripts/chroot-scripts/fish/ (logs are recorded in chroot.log).

btw aren't logs for ubiquity installed items in /var/log/installer on /target? I guess if there is a failure they don't get put there.

Thanks for confirmation, I'm going to propose a PR to remove wake_network.

response to your 2nd question:
so far the package upgrade/installed by the ubiquity portion seems only print the message to journalctl, and those message seems not there in var/log/installer ... so it's not that easy to check the message if the installation error happened in ubiquity portion.

But packages installed by scripts/chroot-scripts/fish/ will be logged into chroot.log, so that easier debugging.

hmm...
from focal image, it seems NetworkManager Sleep method not actually disable ethernet and ping 8.8.8.8 works.

$sudo gdbus call --system --dest org.freedesktop.NetworkManager --object-path /org/freedesktop/NetworkManager --method org.freedesktop.NetworkManager.Sleep true

journalctl:

 七  15 15:36:11 CANONICALID sudo[5389]:   ubuntu : TTY=pts/2 ; PWD=/home/ubuntu ; USER=root ; COMMAND=/usr/bin/gdbus call --syste
m --dest org.freedesktop.NetworkManager --object-path /org/freedesktop/NetworkManager --method org.freedesktop.NetworkManager.Slee
p true
 七  15 15:36:11 CANONICALID sudo[5389]: pam_unix(sudo:session): session opened for user root by (uid=0)
 七  15 15:36:11 CANONICALID NetworkManager[691]: <info>  [1594798571.6913] manager: sleep: sleep requested (sleeping: no  enabled
: yes)
 七  15 15:36:11 CANONICALID NetworkManager[691]: <info>  [1594798571.6917] device (p2p-dev-wlp0s20f3): state change: disconnected
 -> unmanaged (reason 'sleeping', sys-iface-state: 'managed')
 七  15 15:36:11 CANONICALID NetworkManager[691]: <info>  [1594798571.6930] manager: NetworkManager state is now ASLEEP
 七  15 15:36:11 CANONICALID NetworkManager[691]: <info>  [1594798571.6937] audit: op="sleep-control" arg="on" pid=5393 uid=0 resu
lt="success"

so far the package upgrade/installed by the ubiquity portion seems only print the message to journalctl, and those message seems not there in var/log/installer ... so it's not that easy to check the message if the installation error happened in ubiquity portion.

Ah this is probably because copying /var/log/syslog to /target/var/log/installer/syslog is one of the last steps. I think a worthwhile PR to ubiquity would be to copy this even in failure scenarios for investigation.

from focal image, it seems NetworkManager Sleep method not actually disable ethernet and ping 8.8.8.8 works.

This definitely worked before. Sounds like a NM regression.

@superm1
looks the network is waked up by https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/master/src/nm-policy.c#L1499 .
And I opened an upstream bug there as well : https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/503
But I afraid the upstream re-defined the Sleep method so not expect the usage like what we are doing.

How do you think we disable network by another method instead?
https://salsa.debian.org/utopia-team/network-manager/-/blob/debian/master/introspection/org.freedesktop.NetworkManager.xml#L179

To disable/enable network sounds more close to what dell-recovery needed.

I think Sleep is pretty well understood, this seems like a bug to me. But monitoring that issue you filed makes sense.

Does that Enable actually take interface down and up. I would think interface stays up from that command.

I tried 'Enable' method, the network actually disable when set 'Enable' false, and network backs to work when set 'Enable' true.

So, I feel we can at least switch to 'Enable' method before the 'Sleep' issue fixed.
If you are also free for that switching, I can do more testing to confirm it works as our expectation then submit a PR for dell-recovery to close this github issue.

If that works then yes, lets switch to that for now. I agree.