JasonWang's Blog

深入理解Android进程冻结

字数统计: 6.3k阅读时长: 31 min
2024/12/24

GoogleAndroid11系统开始支持应用冻结功能,可以将后台长时间未运行的任务暂缓执行,通过将对应的进程迁移到对应的cgroup分组来冻结对应的后台缓存应用,这样可以减少如CPU、内存等资源占用,减少业务在后台的不当行为,尽可能减少功耗。本文将对Android的进程冻结的实现原理、冻结策略进行详细的介绍与阐述,争取把相关的策略与机制都讲述清楚,主要分为以下几个部分 :

  • Android进程冻结的大致框架:主要介绍进程冻结的总体框架与思路
  • Android进程冻结的实现原理:介绍Android如何实现进程冻结
  • Android进程冻结的冻结策略:进程冻结的具体策略

Android进程冻结整体框架

Android中每个应用都有一个oom_adj(out of memory ajustment)值,用来标记应用的优先级状态;在应用创建、前后台切换、广播接收、服务绑定以及进程崩溃等事件(具体可以参考如下调整的原因)时,会触发oom_adj的变化oom_adj的变化会导致Android系统执行某些特定的策略,比如调整进程所在的cgroup分组,回收应用或者系统内存,或者执行进程冻结,以减少CPU、内存的占用。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

# OomAdjuster.java OOM_ADJ调整的原因
static final String OOM_ADJ_REASON_METHOD = "updateOomAdj";
static final String OOM_ADJ_REASON_NONE = OOM_ADJ_REASON_METHOD + "_meh";
static final String OOM_ADJ_REASON_ACTIVITY = OOM_ADJ_REASON_METHOD + "_activityChange";
static final String OOM_ADJ_REASON_FINISH_RECEIVER = OOM_ADJ_REASON_METHOD + "_finishReceiver";
static final String OOM_ADJ_REASON_START_RECEIVER = OOM_ADJ_REASON_METHOD + "_startReceiver";
static final String OOM_ADJ_REASON_BIND_SERVICE = OOM_ADJ_REASON_METHOD + "_bindService";
static final String OOM_ADJ_REASON_UNBIND_SERVICE = OOM_ADJ_REASON_METHOD + "_unbindService";
static final String OOM_ADJ_REASON_START_SERVICE = OOM_ADJ_REASON_METHOD + "_startService";
static final String OOM_ADJ_REASON_GET_PROVIDER = OOM_ADJ_REASON_METHOD + "_getProvider";
static final String OOM_ADJ_REASON_REMOVE_PROVIDER = OOM_ADJ_REASON_METHOD + "_removeProvider";
static final String OOM_ADJ_REASON_UI_VISIBILITY = OOM_ADJ_REASON_METHOD + "_uiVisibility";
static final String OOM_ADJ_REASON_ALLOWLIST = OOM_ADJ_REASON_METHOD + "_allowlistChange";
static final String OOM_ADJ_REASON_PROCESS_BEGIN = OOM_ADJ_REASON_METHOD + "_processBegin";
static final String OOM_ADJ_REASON_PROCESS_END = OOM_ADJ_REASON_METHOD + "_processEnd";


Android系统进程的冻结主要通过内核中cgroup冻结(freezer)子系统来实现的,对应是下述框图中的右侧区域;如果冻结的进程提供了binder接口,首先需要通过binder接口设置当前服务进程处于冻结状态,这样客户端调用相关的接口时,主动返回错误,而不至于阻塞客户端进程。

  • ActivityManagerService(AMS)系统的核心服务,主要负责应用的创建与状态管理,AMS会通过OomAjduster的接口来调整进程的优先级状态
  • OomAjduster主要用来计算、调整进程的状态与优先级,为内存回收、进程冻结提供参考依据
  • CachedAppOptimizer提供内存回收与进程冻结的能力,对长时间处于后台的应用进行相应的优化处理
  • Process用于管理应用进程,提供如进程创建,进程优先级调整,进程分组等接口

进程冻结实际会分为两个具体的步骤:

  • 首先通过freezeBinder发送命令给binder驱动尝试冻结服务端的进程,binder驱动会冻结对应pid的服务,后续请求都会直接返回一个错误
  • binder服务冻结后,需要通过cgroup冻结子系统执行冻结;进程冻结完成后,进程状态变为S,执行的路径会阻塞在do_freezer_trap

Android进程冻结流程

Android进程冻结实现原理

进程冻结分组挂载

Android冻结的核心原理是基于cgroup中的冻结子系统来完成任务的冻结与解冻;cgroup是最开始是Google工程师引入,是内核用于控制资源比如CPU,内存,IO等的一种非常有效的手段。在Android初始化过程中,会通过解析系统中的cgroups.json文件,将常用的分组挂载到系统中:

  • 进程冻结分组freezer会挂载到/sys/fs/cgroup节点
  • cpu关联的分组有两个,一个是/dev/cpuctl,主要用于控制CPU的调度,一个是/dev/cpuset,主要用于控制CPU的亲和性、大小核绑定
  • memory对应的分组是/dev/memcg,主要用于控制内存的分配
  • io对应的分组是/dev/blkio,主要用于控制IO的调度
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92

int SecondStageMain(int argc, char** argv) {
if (REBOOT_BOOTLOADER_ON_PANIC) {
InstallRebootSignalHandlers();
}

boot_clock::time_point start_time = boot_clock::now();

trigger_shutdown = [](const std::string& command) { shutdown_state.TriggerShutdown(command); };

SetStdioToDevNull(argv);
InitKernelLogging(argv);
LOG(INFO) << "init second stage started!";

// Update $PATH in the case the second stage init is newer than first stage init, where it is
// first set.
if (setenv("PATH", _PATH_DEFPATH, 1) != 0) {
PLOG(FATAL) << "Could not set $PATH to '" << _PATH_DEFPATH << "' in second stage";
}

// Init should not crash because of a dependence on any other process, therefore we ignore
// SIGPIPE and handle EPIPE at the call site directly. Note that setting a signal to SIG_IGN
// is inherited across exec, but custom signal handlers are not. Since we do not want to
// ignore SIGPIPE for child processes, we set a no-op function for the signal handler instead.
{
struct sigaction action = {.sa_flags = SA_RESTART};
action.sa_handler = [](int) {};
sigaction(SIGPIPE, &action, nullptr);
}

// Set init and its forked children's oom_adj.
if (auto result =
WriteFile("/proc/1/oom_score_adj", StringPrintf("%d", DEFAULT_OOM_SCORE_ADJUST));
!result.ok()) {
LOG(ERROR) << "Unable to write " << DEFAULT_OOM_SCORE_ADJUST
<< " to /proc/1/oom_score_adj: " << result.error();
}

// Set up a session keyring that all processes will have access to. It
// will hold things like FBE encryption keys. No process should override
// its session keyring.
keyctl_get_keyring_ID(KEY_SPEC_SESSION_KEYRING, 1);

// Indicate that booting is in progress to background fw loaders, etc.
close(open("/dev/.booting", O_WRONLY | O_CREAT | O_CLOEXEC, 0000));

// See if need to load debug props to allow adb root, when the device is unlocked.
const char* force_debuggable_env = getenv("INIT_FORCE_DEBUGGABLE");
bool load_debug_prop = false;
if (force_debuggable_env && AvbHandle::IsDeviceUnlocked()) {
load_debug_prop = "true"s == force_debuggable_env;
}
unsetenv("INIT_FORCE_DEBUGGABLE");

// Umount the debug ramdisk so property service doesn't read .prop files from there, when it
// is not meant to.
if (!load_debug_prop) {
UmountDebugRamdisk();
}

PropertyInit();

// Umount second stage resources after property service has read the .prop files.
UmountSecondStageRes();

...
// 将SetupCgroupsAction添加到队列中,用于初始化cgroup
am.QueueBuiltinAction(SetupCgroupsAction, "SetupCgroups");
am.QueueBuiltinAction(SetKptrRestrictAction, "SetKptrRestrict");
am.QueueBuiltinAction(TestPerfEventSelinuxAction, "TestPerfEventSelinux");
am.QueueEventTrigger("early-init");

// Queue an action that waits for coldboot done so we know ueventd has set up all of /dev...
am.QueueBuiltinAction(wait_for_coldboot_done_action, "wait_for_coldboot_done");
...

// Trigger all the boot actions to get us started.
am.QueueEventTrigger("init");

// Don't mount filesystems or start core system services in charger mode.
std::string bootmode = GetProperty("ro.bootmode", "");
if (bootmode == "charger") {
am.QueueEventTrigger("charger");
} else {
am.QueueEventTrigger("late-init");
}

...

return 0;
}

Android系统中,cgroups.json文件位于/system/etc/cgroups.json,文件内容如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51

{
"Cgroups": [
{
"Controller": "blkio",
"Path": "/dev/blkio",
"Mode": "0755",
"UID": "system",
"GID": "system"
},
{
"Controller": "cpu",
"Path": "/dev/cpuctl",
"Mode": "0755",
"UID": "system",
"GID": "system"
},
{
"Controller": "cpuset",
"Path": "/dev/cpuset",
"Mode": "0755",
"UID": "system",
"GID": "system"
},
{
"Controller": "memory",
"Path": "/dev/memcg",
"Mode": "0700",
"UID": "root",
"GID": "system",
"Optional": true
}
],
"Cgroups2": {
"Path": "/sys/fs/cgroup",
"Mode": "0755",
"UID": "system",
"GID": "system",
"Controllers": [

{
"Controller": "freezer",
"Path": ".",
"Mode": "0755",
"UID": "system",
"GID": "system"
}
]
}
}

cgroup挂载完成后,通过adb的指令mount可以查看挂载的cgroup信息:

1
2
3
4
5
6
7
8

# mount -t cgroup
none on /dev/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
none on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
none on /dev/cpuctl type cgroup (rw,nosuid,nodev,noexec,relatime,cpu)
none on /dev/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset,noprefix,release_agent=/sbin/cpuset_release_agent)
none on /dev/memcg type cgroup (rw,nosuid,nodev,noexec,relatime,memory)

后续在应用启动创建进程的过程中,AMS会调用ProcessList.startProcess通过Process.createProcessGroup的接口来创建对应用户UID的冻结cgroup分组:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104

private Process.ProcessStartResult startProcess(HostingRecord hostingRecord, String entryPoint,
ProcessRecord app, int uid, int[] gids, int runtimeFlags, int zygotePolicyFlags,
int mountExternal, String seInfo, String requiredAbi, String instructionSet,
String invokeWith, long startTime) {
try {
Trace.traceBegin(Trace.TRACE_TAG_ACTIVITY_MANAGER, "Start proc: " +
app.processName);
checkSlow(startTime, "startProcess: asking zygote to start proc");
final boolean isTopApp = hostingRecord.isTopApp();
if (isTopApp) {
// Use has-foreground-activities as a temporary hint so the current scheduling
// group won't be lost when the process is attaching. The actual state will be
// refreshed when computing oom-adj.
app.mState.setHasForegroundActivities(true);
}

Map<String, Pair<String, Long>> pkgDataInfoMap;
Map<String, Pair<String, Long>> allowlistedAppDataInfoMap;
boolean bindMountAppStorageDirs = false;
boolean bindMountAppsData = mAppDataIsolationEnabled
&& (UserHandle.isApp(app.uid) || UserHandle.isIsolated(app.uid))
&& mPlatformCompat.isChangeEnabled(APP_DATA_DIRECTORY_ISOLATION, app.info);

// Get all packages belongs to the same shared uid. sharedPackages is empty array
// if it doesn't have shared uid.
final PackageManagerInternal pmInt = mService.getPackageManagerInternal();
final String[] sharedPackages = pmInt.getSharedUserPackagesForPackage(
app.info.packageName, app.userId);
final String[] targetPackagesList = sharedPackages.length == 0
? new String[]{app.info.packageName} : sharedPackages;

pkgDataInfoMap = getPackageAppDataInfoMap(pmInt, targetPackagesList, uid);
if (pkgDataInfoMap == null) {
// TODO(b/152760674): Handle inode == 0 case properly, now we just give it a
// tmp free pass.
bindMountAppsData = false;
}

...

// If it's an isolated process, it should not even mount its own app data directories,
// since it has no access to them anyway.
if (app.isolated) {
pkgDataInfoMap = null;
allowlistedAppDataInfoMap = null;
}

final Process.ProcessStartResult startResult;
boolean regularZygote = false;
if (hostingRecord.usesWebviewZygote()) {
startResult = startWebView(entryPoint,
app.processName, uid, uid, gids, runtimeFlags, mountExternal,
app.info.targetSdkVersion, seInfo, requiredAbi, instructionSet,
app.info.dataDir, null, app.info.packageName,
app.getDisabledCompatChanges(),
new String[]{PROC_START_SEQ_IDENT + app.getStartSeq()});
} else if (hostingRecord.usesAppZygote()) {
final AppZygote appZygote = createAppZygoteForProcessIfNeeded(app);

// We can't isolate app data and storage data as parent zygote already did that.
startResult = appZygote.getProcess().start(entryPoint,
app.processName, uid, uid, gids, runtimeFlags, mountExternal,
app.info.targetSdkVersion, seInfo, requiredAbi, instructionSet,
app.info.dataDir, null, app.info.packageName,
/*zygotePolicyFlags=*/ ZYGOTE_POLICY_FLAG_EMPTY, isTopApp,
app.getDisabledCompatChanges(), pkgDataInfoMap, allowlistedAppDataInfoMap,
false, false,
new String[]{PROC_START_SEQ_IDENT + app.getStartSeq()});
} else {
regularZygote = true;
startResult = Process.start(entryPoint,
app.processName, uid, uid, gids, runtimeFlags, mountExternal,
app.info.targetSdkVersion, seInfo, requiredAbi, instructionSet,
app.info.dataDir, invokeWith, app.info.packageName, zygotePolicyFlags,
isTopApp, app.getDisabledCompatChanges(), pkgDataInfoMap,
allowlistedAppDataInfoMap, bindMountAppsData, bindMountAppStorageDirs,
new String[]{PROC_START_SEQ_IDENT + app.getStartSeq()});
}

if (!regularZygote) {
// 创建进程分组
// webview and app zygote don't have the permission to create the nodes
if (Process.createProcessGroup(uid, startResult.pid) < 0) {
Slog.e(ActivityManagerService.TAG, "Unable to create process group for "
+ app.processName + " (" + startResult.pid + ")");
}
}

// This runs after Process.start() as this method may block app process starting time
// if dir is not cached. Running this method after Process.start() can make it
// cache the dir asynchronously, so zygote can use it without waiting for it.
if (bindMountAppStorageDirs) {
storageManagerInternal.prepareStorageDirs(userId, pkgDataInfoMap.keySet(),
app.processName);
}
checkSlow(startTime, "startProcess: returned from zygote!");

return startResult;
} finally {
Trace.traceEnd(Trace.TRACE_TAG_ACTIVITY_MANAGER);
}
}

Process.createProcessGroup实际是一个native方法,android_os_Process_createProcessGroup方法最终调用processgroup.cpp中的createProcessGroupInternal函数,这个函数最终做两件事情:

  • 根据进程的uidpid/sys/fs/cgroup/目录下创建对应的cgroup分组
  • 将进程的pid写入到cgroup分组的procs文件中
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

static int createProcessGroupInternal(uid_t uid, int initialPid, std::string cgroup) {
auto uid_path = ConvertUidToPath(cgroup.c_str(), uid);

struct stat cgroup_stat;
mode_t cgroup_mode = 0750;
gid_t cgroup_uid = AID_SYSTEM;
uid_t cgroup_gid = AID_SYSTEM;

if (stat(cgroup.c_str(), &cgroup_stat) == 1) {
PLOG(ERROR) << "Failed to get stats for " << cgroup;
} else {
cgroup_mode = cgroup_stat.st_mode;
cgroup_uid = cgroup_stat.st_uid;
cgroup_gid = cgroup_stat.st_gid;
}

if (!MkdirAndChown(uid_path, cgroup_mode, cgroup_uid, cgroup_gid)) {
PLOG(ERROR) << "Failed to make and chown " << uid_path;
return -errno;
}

auto uid_pid_path = ConvertUidPidToPath(cgroup.c_str(), uid, initialPid);

if (!MkdirAndChown(uid_pid_path, cgroup_mode, cgroup_uid, cgroup_gid)) {
PLOG(ERROR) << "Failed to make and chown " << uid_pid_path;
return -errno;
}

auto uid_pid_procs_file = uid_pid_path + PROCESSGROUP_CGROUP_PROCS_FILE;

int ret = 0;
if (!WriteStringToFile(std::to_string(initialPid), uid_pid_procs_file)) {
ret = -errno;
PLOG(ERROR) << "Failed to write '" << initialPid << "' to " << uid_pid_procs_file;
}

return ret;
}

等系统正常启动完成后,我们可以到/sys/fs/cgroup/目录下查看对应的cgroup分组状态:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42

rk3588m_car:/sys/fs/cgroup # ls -al
total 0
drwxr-xr-x 47 system system 0 2024-12-16 19:11 .
drwxr-xr-x 11 root root 0 1970-01-01 08:00 ..
-rwxr-xr-x 1 system system 0 1970-01-01 08:00 cgroup.controllers
-rwxr-xr-x 1 system system 0 1970-01-01 08:00 cgroup.max.depth
-rwxr-xr-x 1 system system 0 1970-01-01 08:00 cgroup.max.descendants
-rwxr-xr-x 1 system system 0 1970-01-01 08:00 cgroup.procs
-rwxr-xr-x 1 system system 0 1970-01-01 08:00 cgroup.stat
-rwxr-xr-x 1 system system 0 1970-01-01 08:00 cgroup.subtree_control
-rwxr-xr-x 1 system system 0 1970-01-01 08:00 cgroup.threads
-rwxr-xr-x 1 system system 0 1970-01-01 08:00 cpu.pressure
-rwxr-xr-x 1 system system 0 1970-01-01 08:00 cpu.stat
-rwxr-xr-x 1 system system 0 1970-01-01 08:00 io.pressure
-rwxr-xr-x 1 system system 0 1970-01-01 08:00 memory.pressure
drwxr-xr-x 29 system system 0 2024-12-16 19:31 uid_0
drwxr-xr-x 98 system system 0 2024-12-16 19:11 uid_1000
drwxr-xr-x 3 system system 0 2024-12-16 19:10 uid_10004
drwxr-xr-x 3 system system 0 2024-12-16 19:11 uid_10005
drwxr-xr-x 3 system system 0 2024-12-16 19:11 uid_10007
drwxr-xr-x 3 system system 0 2024-12-16 19:11 uid_10009
drwxr-xr-x 3 system system 0 2024-12-16 19:11 uid_10010
drwxr-xr-x 3 system system 0 2024-12-16 19:11 uid_10011
drwxr-xr-x 3 system system 0 2024-12-16 19:11 uid_10012
drwxr-xr-x 3 system system 0 2024-12-16 19:11 uid_1002
drwxr-xr-x 3 system system 0 2024-12-16 19:11 uid_10020
...
drwxr-xr-x 3 system system 0 2024-12-16 19:11 uid_10029
drwxr-xr-x 2 system system 0 2024-12-16 19:10 uid_1003
drwxr-xr-x 3 system system 0 2024-12-16 19:10 uid_10033
drwxr-xr-x 3 system system 0 2024-12-16 19:11 uid_10037
drwxr-xr-x 3 system system 0 2024-12-16 19:11 uid_10038
drwxr-xr-x 4 system system 0 2024-12-16 19:10 uid_1010
drwxr-xr-x 3 system system 0 2024-12-16 19:10 uid_1020
drwxr-xr-x 3 system system 0 2024-12-16 19:11 uid_1036
drwxr-xr-x 2 system system 0 2024-12-16 19:10 uid_1037
drwxr-xr-x 3 system system 0 2024-12-16 19:10 uid_1040
drwxr-xr-x 6 system system 0 2024-12-16 19:10 uid_1041
drwxr-xr-x 7 system system 0 2024-12-16 19:10 uid_1046
drwxr-xr-x 3 system system 0 2024-12-16 19:10 uid_1047

进程冻结实现原理

在文章开始我们提到Android进程冻结的核心原理是基于cgroup中的冻结子系统来完成任务的冻结与解冻;具体来说,Android进程冻结分为两个步骤:

  • 首先通过IPCThreadState.freeze发送命令给binder驱动尝试冻结服务端的进程,binder驱动会冻结对应pid的服务,后续请求都会直接返回一个错误
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

status_t IPCThreadState::freeze(pid_t pid, bool enable, uint32_t timeout_ms) {
struct binder_freeze_info info;
int ret = 0;

info.pid = pid;
info.enable = enable;
info.timeout_ms = timeout_ms;


#if defined(__ANDROID__)
if (ioctl(self()->mProcess->mDriverFD, BINDER_FREEZE, &info) < 0)
ret = -errno;
#endif

//
// ret==-EAGAIN indicates that transactions have not drained.
// Call again to poll for completion.
//
return ret;
}

binder驱动接收到冻结指令BINDER_FREEZE后,会将对应的binder服务进程设置为frozen状态,后续请求都会直接返回一个BR_FROZEN_REPLY错误码,表示binder服务已经被冻结;如果设置了timeout_ms,则需要等待binder服务完成所有客户端的请求后再返回。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49

static int binder_ioctl_freeze(struct binder_freeze_info *info,
struct binder_proc *target_proc)
{
int ret = 0;

if (!info->enable) {
binder_inner_proc_lock(target_proc);
target_proc->sync_recv = false;
target_proc->async_recv = false;
target_proc->is_frozen = false;
binder_inner_proc_unlock(target_proc);
return 0;
}

/*
* Freezing the target. Prevent new transactions by
* setting frozen state. If timeout specified, wait
* for transactions to drain.
*/
binder_inner_proc_lock(target_proc);
target_proc->sync_recv = false;
target_proc->async_recv = false;
target_proc->is_frozen = true;
binder_inner_proc_unlock(target_proc);

if (info->timeout_ms > 0)
ret = wait_event_interruptible_timeout(
target_proc->freeze_wait,
(!target_proc->outstanding_txns),
msecs_to_jiffies(info->timeout_ms));

/* Check pending transactions that wait for reply */
if (ret >= 0) {
binder_inner_proc_lock(target_proc);
if (binder_txns_pending_ilocked(target_proc))
ret = -EAGAIN;
binder_inner_proc_unlock(target_proc);
}

if (ret < 0) {
binder_inner_proc_lock(target_proc);
target_proc->is_frozen = false;
binder_inner_proc_unlock(target_proc);
}

return ret;
}

  • binder服务冻结后,需要通过android_os_Process_setProcessFrozen接口通过cgroup冻结子系统执行冻结;进程冻结完成后,进程状态变为S,执行的路径会阻塞在do_freezer_trap
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

// android_os_Process_setProcessFrozen
void android_os_Process_setProcessFrozen(
JNIEnv *env, jobject clazz, jint pid, jint uid, jboolean freeze)
{
bool success = true;

if (freeze) {
success = SetProcessProfiles(uid, pid, {"Frozen"});
} else {
success = SetProcessProfiles(uid, pid, {"Unfrozen"});
}

if (!success) {
signalExceptionForGroupError(env, EINVAL, pid);
}
}



Android进程cgroup相关的配置文件有两个:一个是controller相关的cgroups.json,另一个是profiles相关的task_profiles.json。在task_profiles.json中,FrozenUnfrozen两个profiles分别对应FreezerState10,而FreezerState对应的是控制器freezercgroup.freeze文件。

有关cgroup的详细介绍可以参考深入理解Android进程冻结

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78

//task_profiles.json
{
"Attributes": [
{
"Name": "LowCapacityCPUs",
"Controller": "cpuset",
"File": "background/cpus"
},
...
{
"Name": "FreezerState",
"Controller": "freezer",
"File": "cgroup.freeze"
}
],

"Profiles": [
{
"Name": "HighEnergySaving",
"Actions": [
{
"Name": "JoinCgroup",
"Params":
{
"Controller": "cpu",
"Path": "background"
}
}
]
},
{
"Name": "Frozen",
"Actions": [
{
"Name": "SetAttribute",
"Params":
{
"Name": "FreezerState",
"Value": "1"
}
}
]
},
{
"Name": "Unfrozen",
"Actions": [
{
"Name": "SetAttribute",
"Params":
{
"Name": "FreezerState",
"Value": "0"
}
}
]
},
...
],

"AggregateProfiles": [
{
"Name": "SCHED_SP_BACKGROUND",
"Profiles": [ "HighEnergySaving", "LowIoPriority", "TimerSlackHigh" ]
},
{
"Name": "SCHED_SP_FOREGROUND",
"Profiles": [ "HighPerformance", "HighIoPriority", "TimerSlackNormal" ]
},
{
"Name": "SCHED_SP_TOP_APP",
"Profiles": [ "MaxPerformance", "MaxIoPriority", "TimerSlackNormal" ]
},
...
]
}


SetProcessProfiles调用TaskProfiles.SetProcessProfiles函数来完成进程的冻结:SetProcessProfiles函数首先遍历系统中存在的所有profiles,找到对应名字为Frozenprofile,然后调用TaskProfile.ExecuteForProcess来完成进程的冻结。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

//processgroup.cpp
bool SetProcessProfiles(uid_t uid, pid_t pid, const std::vector<std::string>& profiles) {
return TaskProfiles::GetInstance().SetProcessProfiles(uid, pid, profiles, false);
}

//task_profiles.cpp
bool TaskProfiles::SetProcessProfiles(uid_t uid, pid_t pid,
const std::vector<std::string>& profiles, bool use_fd_cache) {
for (const auto& name : profiles) {
TaskProfile* profile = GetProfile(name);
if (profile != nullptr) {
if (use_fd_cache) {
profile->EnableResourceCaching(ProfileAction::RCT_PROCESS);
}
if (!profile->ExecuteForProcess(uid, pid)) {
PLOG(WARNING) << "Failed to apply " << name << " process profile";
}
} else {
PLOG(WARNING) << "Failed to find " << name << "process profile";
}
}
return true;
}

ExecuteForTask首先需要通过对应的ProfileAttribute获取到对应的cgroup路径,然后通过WriteStringToFileFreezerState的值写入到对应的cgroup.freeze文件中:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

//task_profile.cpp
bool SetAttributeAction::ExecuteForTask(int tid) const {
std::string path;

if (!attribute_->GetPathForTask(tid, &path)) {
LOG(ERROR) << "Failed to find cgroup for tid " << tid;
return false;
}

if (!WriteStringToFile(value_, path)) {
PLOG(ERROR) << "Failed to write '" << value_ << "' to " << path;
return false;
}

return true;
}

GetPathForTask函数通过controller()->GetTaskGroup获取到对应的cgroup路径,然后通过StringPrintfcgroup.freeze文件的路径拼接起来,最终对应的路径为/sys/fs/cgroup/<uid>/<pid>/cgroup.freeze: 在该路径下写入1表示进程被冻结,写入0表示进程被解冻。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

//task_profile.cpp
bool ProfileAttribute::GetPathForTask(int tid, std::string* path) const {
std::string subgroup;
if (!controller()->GetTaskGroup(tid, &subgroup)) {
return false;
}

if (path == nullptr) {
return true;
}

if (subgroup.empty()) {
*path = StringPrintf("%s/%s", controller()->path(), file_name_.c_str());
} else {
*path = StringPrintf("%s/%s/%s", controller()->path(), subgroup.c_str(),
file_name_.c_str());
}
return true;
}


GetTaskGroup首先根据进程pid找到对应的cgroup所属的分组信息:冻结分组比较特殊,以0::开头,其余分组的则通过1:的形式开头。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

bool CgroupController::GetTaskGroup(int tid, std::string* group) const {
std::string file_name = StringPrintf("/proc/%d/cgroup", tid);
std::string content;
if (!android::base::ReadFileToString(file_name, &content)) {
PLOG(ERROR) << "Failed to read " << file_name;
return false;
}

// if group is null and tid exists return early because
// user is not interested in cgroup membership
if (group == nullptr) {
return true;
}

std::string cg_tag;

if (version() == 2) {
cg_tag = "0::";
} else {
cg_tag = StringPrintf(":%s:", name());
}
size_t start_pos = content.find(cg_tag);
if (start_pos == std::string::npos) {
return false;
}

start_pos += cg_tag.length() + 1; // skip '/'
size_t end_pos = content.find('\n', start_pos);
if (end_pos == std::string::npos) {
*group = content.substr(start_pos, std::string::npos);
} else {
*group = content.substr(start_pos, end_pos - start_pos);
}

return true;
}


写入cgroup.freeze文件后,对应调用到内核函数cgroup_freeze_write,实际通过cgroup_freeze将该分组下面的搜友子分组对应的所有任务都设置为FROZEN状态:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

static ssize_t cgroup_freeze_write(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off)
{
struct cgroup *cgrp;
ssize_t ret;
int freeze;

ret = kstrtoint(strstrip(buf), 0, &freeze);
if (ret)
return ret;

if (freeze < 0 || freeze > 1)
return -ERANGE;

cgrp = cgroup_kn_lock_live(of->kn, false);
if (!cgrp)
return -ENOENT;

cgroup_freeze(cgrp, freeze);

cgroup_kn_unlock(of->kn);

return nbytes;
}

对于单个任务的冻结,都是通过函数cgroup_freeze_task来完成,该函数通过设置task->jobctlJOBCTL_TRAP_FREEZE位来完成任务的冻结,通过清除task->jobctlJOBCTL_TRAP_FREEZE位来完成任务的解冻。可以看到,内核实现任务的冻结并没有直接通过向对应的任务发送信号,而是首先设置一个JOBCTL_TRAP_FREEZE位;并通过set_tsk_thread_flag来标记当前任务有需要处理的信号,然后通过signal_wake_up函数唤醒对应的任务。任务唤醒后会返回到用户空间,然后在返回的路径上处理任务阻塞的信号,最终调用到get_signal函数来完成进程的冻结。

详细的内核冻结流程可以参考深入探究 Linux 内核中的 cgroup freezer 子系统

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

//kernel/cgroup/freezer.c
/*
* Freeze or unfreeze the task by setting or clearing the JOBCTL_TRAP_FREEZE
* jobctl bit.
*/
static void cgroup_freeze_task(struct task_struct *task, bool freeze)
{
unsigned long flags;

/* If the task is about to die, don't bother with freezing it. */
if (!lock_task_sighand(task, &flags))
return;

if (freeze) {
task->jobctl |= JOBCTL_TRAP_FREEZE;
signal_wake_up(task, false);
} else {
task->jobctl &= ~JOBCTL_TRAP_FREEZE;
wake_up_process(task);
}

unlock_task_sighand(task, &flags);
}

get_signal函数会检查当前进程是否需要处理信号,并检查JOBCTL_TRAP_FREEZE标志位,如果任务设置了该标志位,则调用do_freezer_trap函数来完成进程的冻结,这个函数也是冻结的任务最后执行的函数,在进程冻结后,我们可以通过查看进程的堆栈来确认这一点。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44


bool get_signal(struct ksignal *ksig)
{
struct sighand_struct *sighand = current->sighand;
struct signal_struct *signal = current->signal;
int signr;

...
for (;;) {
struct k_sigaction *ka;

if (unlikely(current->jobctl & JOBCTL_STOP_PENDING) &&
do_signal_stop(0))
goto relock;

if (unlikely(current->jobctl &
(JOBCTL_TRAP_MASK | JOBCTL_TRAP_FREEZE))) {
if (current->jobctl & JOBCTL_TRAP_MASK) {
do_jobctl_trap();
spin_unlock_irq(&sighand->siglock);
//执行进程冻结的函数
} else if (current->jobctl & JOBCTL_TRAP_FREEZE)
do_freezer_trap();

goto relock;
}

/*
* If the task is leaving the frozen state, let's update
* cgroup counters and reset the frozen bit.
*/
if (unlikely(cgroup_task_frozen(current))) {
spin_unlock_irq(&sighand->siglock);
cgroup_leave_frozen(false);
goto relock;
}

...

return ksig->sig > 0;
}


do_freezer_trap实际就做了这么三件事情:

  • 将当前任务的状态设置为TASK_INTERRUPTIBLE,并清除TIF_SIGPENDING标志位
  • 调用cgroup_enter_frozen设置当前任务为FROZEN状态,并更新对应分组的状态
  • 调用freezable_schedule启动调度,冻结的任务会移除调度队列,任务处于睡眠状态,切换其他任务执行
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

/**
* do_freezer_trap - handle the freezer jobctl trap
*
*/
static void do_freezer_trap(void)
__releases(&current->sighand->siglock)
{

...
/*
* Now we're sure that there is no pending fatal signal and no
* pending traps. Clear TIF_SIGPENDING to not get out of schedule()
* immediately (if there is a non-fatal signal pending), and
* put the task into sleep.
*/
__set_current_state(TASK_INTERRUPTIBLE);
clear_thread_flag(TIF_SIGPENDING);
spin_unlock_irq(&current->sighand->siglock);
cgroup_enter_frozen();
freezable_schedule();
}

进程完全冻结后,我们通过ps -A命令查看进程状态,可以看到进程的状态为S,任务的等待通道(wait channel)为do_freezer_trap;查看进程的堆栈,可以看到进程确实是通过信号处理函数进入了冻结状态。

1
2
3
4
5
6
7
8
9
10
11
12
13

#ps -A|grep -i rknn
root 873 1 10972640 3448 do_freezer_trap 0 S rknn_server

# cat /proc/873/stack
[<0>] __switch_to+0x118/0x148
[<0>] do_freezer_trap+0x64/0xbc
[<0>] get_signal+0x370/0x77c
[<0>] do_signal+0xa0/0x298
[<0>] do_notify_resume+0xac/0x218
[<0>] work_pending+0xc/0x76c


Android进程冻结策略

Android系统会在进程启动、服务绑定、应用前后台切换、发送/接收广播等场景会主动更新系统所有应用的adj值,adj值越小,表示进程优先级越高,对应的存活时间越久,越不容易被系统杀死。一个应用处于后台,如果长时间没有活动,系统会调整adj值,在系统资源紧张(比如内存不足时),会主动清理(冻结或者杀死)这些adj值较大(CACHED_APP_MIN_ADJ(900)<=adj<=CACHED_APP_MAX_ADJ(999))的进程。

应用调整adj值的核心逻辑都在OomAdjuster类中实现;更新完所有应用的adj值后,如果发现该进程的adj值大于CACHED_APP_MIN_ADJ,则会尝试调用CachedAppOptimizer.freezeAppAsyncLSP冻结该进程。其调用的链路大致如下:

1
2
3
4
5
6
7
8

//OomAdjuster.java
updateOomAdjLocked -> updateOomAdjLSP -> performUpdateOomAdjLSP
-> updateOomAdjInnerLSP -> updateAndTrimProcessLSP -> applyOomAdjLSP
-> updateAppFreezeStateLSP
//CachedAppOptimizer.java
-> freezeAppAsyncLSP

updateAppFreezeStateLSP函数首先会判断系统是否开启了进程冻结功能,该功能默认是开启的,具体的值可以通过设置两个配置项来开关(全局数据库的配置优先级更高):

  • 全局数据库Settings.Global.CACHED_APPS_FREEZER_ENABLED:存放在系统数据库中的开关项,比如adb shell settings put global cached_apps_freezer 1
  • 设备配置DeviceConfig中的use_freezer项来设置,比如adb shell device_config put activity_manager_native_boot use_freezer true

如果未两个配置项都未开启,则说明系统不支持进程冻结,直接返回;否则如果进程的adj值大于等于CACHED_APP_MIN_ADJ且未被冻结过,则调用freezeAppAsyncLSP函数来冻结进程。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

//CachedAppOptimizer.java
private void updateAppFreezeStateLSP(ProcessRecord app) {
if (!mCachedAppOptimizer.useFreezer()) {
return;
}

if (app.mOptRecord.isFreezeExempt()) {
return;
}

final ProcessCachedOptimizerRecord opt = app.mOptRecord;
// if an app is already frozen and shouldNotFreeze becomes true, immediately unfreeze
if (opt.isFrozen() && opt.shouldNotFreeze()) {
mCachedAppOptimizer.unfreezeAppLSP(app);
return;
}

final ProcessStateRecord state = app.mState;
// Use current adjustment when freezing, set adjustment when unfreezing.
if (state.getCurAdj() >= ProcessList.CACHED_APP_MIN_ADJ && !opt.isFrozen()
&& !opt.shouldNotFreeze()) {
mCachedAppOptimizer.freezeAppAsyncLSP(app);
} else if (state.getSetAdj() < ProcessList.CACHED_APP_MIN_ADJ) {
mCachedAppOptimizer.unfreezeAppLSP(app);
}
}


freezeAppAsyncLSP并不会立即执行进程的冻结,而是通过mFreezeHandler发送一个延迟10分钟的SET_FROZEN_PROCESS_MSG消息,如果在此期间,系统的adj没有变小,则执行进程的冻结。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

//CachedAppOptimizer.java
@GuardedBy({"mAm", "mProcLock"})
void freezeAppAsyncLSP(ProcessRecord app) {
final ProcessCachedOptimizerRecord opt = app.mOptRecord;
if (opt.isPendingFreeze()) {
// Skip redundant DO_FREEZE message
return;
}

mFreezeHandler.sendMessageDelayed(
mFreezeHandler.obtainMessage(
SET_FROZEN_PROCESS_MSG, DO_FREEZE, 0, app),
mFreezerDebounceTimeout);
opt.setPendingFreeze(true);
if (DEBUG_FREEZER) {
Slog.d(TAG_AM, "Async freezing " + app.getPid() + " " + app.processName);
}
}

总结

进程冻结的核心目标是在Android内存紧张时,主动冻结长时间不活动的后台应用,释放内存资源,从而节省功耗,提升系统性能。但目前来说,Android进程冻结的实现并不完善,还存在一些可以改善的地方,比如:

  • 进程冻结只考虑到了内存资源情况,没有考虑到如CPU、IO等其他系统资源的占用情况
  • 进程冻结目前只支持Java层的应用,对于Native的进程并不支持冻结

参考文献

原文作者:Jason Wang

更新日期:2024-12-24, 20:18:11

版权声明:本文采用知识共享署名-非商业性使用 4.0 国际许可协议进行许可

CATALOG
  1. 1. Android进程冻结整体框架
  2. 2. Android进程冻结实现原理
    1. 2.1. 进程冻结分组挂载
    2. 2.2. 进程冻结实现原理
  3. 3. Android进程冻结策略
  4. 4. 总结
  5. 5. 参考文献