How TIM escape being killed in Andorid 10? (based on Leoric)
The workflow:
1. Parse pacakge name of the app to get its UID, say UIDXXX
(ActivityManagerService#forceStopPackage())
2. Collect into procs (ArrayList<ProcessRecord>) all processes whose UID is UIDXXX
via iterating the ProcessList maintained by ActivityManagerService
(ProcessList#killPackageProcessesLocked())
3. For each process P in procs (ProcessList#killPackageProcessesLocked()):
2.1 kill P using syscall kill(<pid_of_P>, SIGKILL) (ProcessRecord#kill()
-> Process#killProcessQuiet())
2.2 kill the process members of the process cgroup P resides in (ProcessRecord#kill()
-> ProcessList#killProcessGroup() -> Process#killProcessGroup()
-> libprocessgroup: KillProcessGroup())
2.2.1 read /proc/<pid_of_p>/cgroup to know which process cgroup p belongs to,
say C (libprocessgroup: KillProcessGroup())
2.2.2 for each process PP in the cgroup C (by reading /acct/uid_UIDXXX/pid_<pid_of_C>/cgroup.procs):
a. if PP is a leader of a process group (PID == PGID), kill its process group using
syscall kill(-<pid_of_PP>, SIGKILL)
b. otherwise, kill PP itself (libprocessgroup: DoKillProcessGroupOnce())
2.2.3 repeat 2.2.2 40 times, and sleep 5ms each time each after
4. clear other components still managered by other managers
Yoric is a multi-proc-arch app, with the following processes:
main
:com.example.yoric
, used bycom.example.yoric.MainActivity
remote
:com.example.yoric:remote
, used bycom.example.yoric.RemoteService
android-remote
:android.remote
, used bycom.example.yoric.AndroidRemoteService
remote-c
:com.example.yoric:remote-c
, forked twice using JNI byremote
, and left as an orphan processandroid-remote-c
:android.remote-c
, forked twice using JNI byandroid-remote
, and left as an orphan process
When Yoric is installed, and started, one can use ps -o USER,PID,PGID,PPID,NAME
to get the following information:
USER PID PGID PPID NAME
root 1 0 0 init
root 1543 1543 1 zygote
u0_a129 23574 1543 1543 com.example.yoric
u0_a129 23603 1543 1543 com.example.yoric:remote
u0_a129 23631 1543 1 com.example.yoric:remote-c
u0_a129 23625 1543 1543 android.remote
u0_a129 23658 1543 1 android.remote-c
From where, one can see that:
- UID (USER): all same, indicating all processes of Yoric share the same UID, which is the UID of Yoric, i.e., 10129 (u0_a129) (see it using
adb shell cat /data/system/packages.xml | grep com.example.yoric
) - PGID: all same and equals to PID of zygote, indicating all processes of Yoric forked directly or indirectly by zygote belongs to the process group led by zygote
Given that Yoric is a general and default-setting (empty activitiy/service implementations) 3rd-party demo app, one concludes that
in Android, almoast all processes forked directly or indirectly by zygote belongs to
the process group led by zygote (unless the app itself sets the process group manually)
Given the abvoe conclusion, one knows in practice, when force-stopping a common 3rd-party app, the workflow 2.2.2-a
is usually redandant, and no process groups are actually killed (cause almost all processes are members of zygote group, not leaders), unless the app itself sets the process groups manually
Thereby intuitively, force-stop kills all processes one by one (not groupfully) sharing the same UID as the app to be killed
As one knows, force-stop kills the processes by firstly collecting all processes that to be killed, then kill them one by one, so if a process is spawn when another process is killed, it is not contained by the collected process list, and thereby escaped killing once.
Then how to know one process is killed, the magic behind is flock()
, i.e., two processes watches each other by locking the other's file (called them watcher and watchee).
In general, the watcher (remote
, remote-c
, andorid-remote
, and android-remote-c
) creates and locks a file (call it indicator), and watches the watchee by trying to lock its indicator, so if watchee is still alive, considering its indicator is already locked, watcher locks failed, othewise watcher knows that watchee died, then restarts it and kills self, making all happen from scratch.
Considering that the indicator creation and lock of one process should be completed atomically (e.g., if watcher locks the indicator of watchee right after the it is created but before locked by watchee, then watcher locks successfully, which makes it incorrectly treat watchee as dead, however, the watchee is still alive), the other process have to wait before lock succeeded. To implement this, watchee creats a flag file after successful lock, and watcher tests the existence of the flag file, if the flag exists, it knows watchee locked successfully, afterwards, it helps watchee to delete the flag (IMPORTANT! IF WATCHER DOES NOT HELP WATCHEE DELETE IT, WHEN WATCHEE IS KILLED, THE FLAG STILL EXISTS, WHICH BREAKS THE ATOMIC CHARACTICS, AND IN THE END LEAD WATCHER TO INCORRECTLY CONSIDERING WATCHEE AS DEAD).
In detail,
remote
andandroid-remote
watches each otherremote-c
andandroid-remote-c
watches each other
1. com.example.yoric (MainActivity) starts, and also starts com.example.yoric:remote (RemoteService)
2. com.example.yoric:remote starts android.remote (AndroidRemostService)
3. com.example.yoric:remote forks twice to com.example.yoric:remote-c
4. android.remote forks twice to android.remote-c
5. each watcher creates and locks self's file
6. aftewards, creates a flag file to tell its watcher that it is ready
7. aftewards, waits until it watchee ready
8. aftewards, watches its watchee
9. when one process is watched dead, the watcher starts it, and kill self