dedup blocks makes compression less efficient
shiqicao opened this issue · 5 comments
aftering add dedup block after InterDex pass, some thing interesting has been observed,
apk size is increased by 0.066%, but uncompressed dex files size are decreased, but compressed dex files size is increased. DedupBlocksPass is placed after InterDex.
base | dedup_block | |
---|---|---|
apk_size | 153546207 | 153647225(0.066%) |
dex_size | 116377384 | 115970356(-0.350%) |
dex_compressed_size | 42340172 | 42441213(0.239%) |
Any ideas?
DedupBlocksPass and InterDexPass are doing completely different things, they are orthogonal. You could swap the order and I'd expect the exact same result.
The decreased (uncompressed) dex size is generally expected.
The compressed size increase is a bit surprising indeed. There could be an interaction with a pass that runs later, or it could just be a strange interaction between deduping your particular code and the deflate compression algorithm. To see if there's a significant interaction with later passes, look if any reported metrics in the redex.stats file change in a surprising way for any pass that runs after DedupBlocksPass.
The list I posted in #735 roughly follows the order which we've found most beneficial, optimizing for both APK and (uncompressed) dex size on disk.
I tried putting DedupBlocksPass as the last one, result is still the same. I attached our list. The pass seems correctly find profitable blocks from the output of dex-insights. My unverified theory is that compression algorithm compresses eliminated blocks very well, the size of compressed removed blocks is less than the size of compressed added code.
APK_FILE_1 | APK_FILE_2 | COMPONENT | COUNT_1 | COUNT_2 | COUNT_DIFF | BYTES_1 | BYTES_2 | BYTES_DIFF |
---|---|---|---|---|---|---|---|---|
base | dedup_block | ANNOTATIONS_DIRECTORY_ITEM | 107015 | 107015 | 0 | 2659440 | 2659440 | 0 |
base | dedup_block | ANNOTATION_ELEMENT | 157535 | 157535 | 0 | 1964644 | 1964644 | 0 |
base | dedup_block | ANNOTATION_ITEM | 150106 | 150106 | 0 | 2638250 | 2638250 | 0 |
base | dedup_block | ANNOTATION_OFF_ITEM | 264316 | 264316 | 0 | 1057264 | 1057264 | 0 |
base | dedup_block | ANNOTATION_SET_ITEM | 150123 | 150123 | 0 | 1657756 | 1657756 | 0 |
base | dedup_block | ANNOTATION_SET_REF_ITEM | 4234 | 4234 | 0 | 16936 | 16936 | 0 |
base | dedup_block | ANNOTATION_SET_REF_LIST | 1091 | 1091 | 0 | 21300 | 21300 | 0 |
base | dedup_block | CLASS_DATA_ITEM | 162081 | 162081 | 0 | 7546543 | 7545538 | -1005 |
base | dedup_block | CLASS_DEF | 162096 | 162096 | 0 | 5187072 | 5187072 | 0 |
base | dedup_block | CODE_ITEM | 758671 | 758671 | 0 | 57467523 | 57102996 | -364527 |
base | dedup_block | DEBUG_INFO_ITEM | 5700 | 5691 | -9 | 3196050 | 3154002 | -42048 |
base | dedup_block | DEX | 17 | 17 | 0 | 1904 | 1904 | 0 |
base | dedup_block | ENCODED_ANNOTATION | 152094 | 152094 | 0 | 2526397 | 2526397 | 0 |
base | dedup_block | ENCODED_ARRAY | 78233 | 78233 | 0 | 1209300 | 1209300 | 0 |
base | dedup_block | ENCODED_ARRAY_ITEM | 503 | 503 | 0 | 4241 | 4241 | 0 |
base | dedup_block | ENCODED_CATCH_HANDLER | 46122 | 45710 | -412 | 181889 | 179790 | -2099 |
base | dedup_block | ENCODED_CATCH_HANDLER_LIST | 33244 | 33244 | 0 | 215133 | 213034 | -2099 |
base | dedup_block | ENCODED_FIELD | 366712 | 366712 | 0 | 1040553 | 1040553 | 0 |
base | dedup_block | ENCODED_METHOD | 812956 | 812956 | 0 | 5857562 | 5856557 | -1005 |
base | dedup_block | ENCODED_PRIMITIVE | 443906 | 443906 | 0 | 1326553 | 1326553 | 0 |
base | dedup_block | ENCODED_TYPE_ADDR_PAIR | 27027 | 26702 | -325 | 105702 | 104280 | -1422 |
base | dedup_block | FIELD_ANNOTATION | 52609 | 52609 | 0 | 420872 | 420872 | 0 |
base | dedup_block | FIELD_ID | 444339 | 444339 | 0 | 3554712 | 3554712 | 0 |
base | dedup_block | HEADER_ITEM | 17 | 17 | 0 | 1904 | 1904 | 0 |
base | dedup_block | MAP_ITEM | 278 | 278 | 0 | 3336 | 3336 | 0 |
base | dedup_block | MAP_LIST | 17 | 17 | 0 | 3608 | 3608 | 0 |
base | dedup_block | METHOD_ANNOTATION | 64252 | 64252 | 0 | 514016 | 514016 | 0 |
base | dedup_block | METHOD_ID | 1056224 | 1056224 | 0 | 8449792 | 8449792 | 0 |
base | dedup_block | PARAMETER_ANNOTATION | 1539 | 1539 | 0 | 12312 | 12312 | 0 |
base | dedup_block | PROTO_ID | 196960 | 196960 | 0 | 2363520 | 2363520 | 0 |
base | dedup_block | STRING_DATA_ITEM | 691894 | 691894 | 0 | 15387171 | 15387171 | 0 |
base | dedup_block | STRING_ID | 691894 | 691894 | 0 | 2767576 | 2767576 | 0 |
base | dedup_block | TRY_ITEM | 54444 | 52387 | -2057 | 435552 | 419096 | -16456 |
base | dedup_block | TYPE_ID | 293652 | 293652 | 0 | 1174608 | 1174608 | 0 |
base | dedup_block | TYPE_LIST | 136320 | 136320 | 0 | 1446876 | 1446876 | 0 |
base | dedup_block | TYPE_LIST_ITEM | 370812 | 370812 | 0 | 741624 | 741624 | 0 |
"MakePublicPass",
"PerfMethodInlinePass",
"ResolveRefsPass",
"BridgeSynthInlinePass",
"ResultPropagationPass",
"BdKtDataClassPass",
"RemoveApiLevelChecksPass",
"ReBindRefsPass",
"ObjectEscapeAnalysisPass",
"RemoveRedundantCheckCastsPass",
"ConstantPropagationPass",
"BdLocalDcePass",
"AnnoKillPass",
"InterDexPass",
"RenameClassesPassV3",
"RegAllocPass",
"RemoveGotosPass",
"DedupBlocksPass"
Another 2 questions I'd like to ask
- is there any pass after DedupBlocksPass, which can "cleanup" the added code by DedupBlocksPass?
- what do you set for block_split_min_opcode_count?
- DedupBlocks doesn't really add any code (except gotos to common code), so there's no pass to remove anything further. We also run it as one of the last passes, like you show it.
- We leave it at 1.
My unverified theory is that compression algorithm compresses eliminated blocks very well, the size of compressed removed blocks is less than the size of compressed added code.
There's certainly some truth to that.
We do run DedupBlocks more than once, you could try to see if it makes things better if you sprinkle it in somewhere earlier. DedupBlocks also runs as part of the ShrinkerPass, and MethodInlinePass, and IntraDexInlinePass.
At the end, there are some passes you want to run after RegAllocPass
in a particular order to clean things up. As outlined in my earlier post, this is the exact order in which we run the final passes:
RegAllocPass,
BranchPrefixHoistingPass,
CopyPropagationPass,
LocalDcePass,
DedupBlocksPass,
ReduceGotosPass,
Thank you! After adding ReduceGotosPass after DedupStringPass, now DedupStringPass decreases package size.