hsiafan/apk-parser

Slow parsing of APK files

AndroidDeveloperLB opened this issue · 14 comments

I tried to check how much time it takes to parse APK files using this library, compared to the one of Android's framework.

I know it's a bit hard to compare performance, but I tried to get it anyway.

So I went over all installed apps, taking all flags possible for packageManager.getInstalledPackages , and getting the name of each app.

And I did it for the library itself. I know the library does a lot more, but I think it can be better if we tell it exactly what to do.

The results are quite bad (tested on Pixel 2).
It took 3555 ms to use the Android framework.
But, it took 27088 ms to use this library.

Attached here the sample project.

My Application.zip

Is it possible to split the various parsing tasks in the library?
Something that could match the one of Android framework?

When I don't give any flags and not getting the names of the apps, it's even much faster (less than 100ms)...

It doesn't get the app name and icons, but I can do it later on a different task.

Is there a way to make it faster? To match the same performance, and delay loading of things that might not even be needed?

Maybe to parse only the manifest file, and later the rest, if needed?

So to parse the manifest alone, I've found I can do this:

        AsyncTask.execute {
            val packageInfo = packageManager.getPackageInfo(packageName, 0)
            val apkFilePath = packageInfo.applicationInfo.publicSourceDir
            val zipInputStream = ZipInputStream(FileInputStream(apkFilePath))
            while (true) {
                val zipEntry = zipInputStream.nextEntry ?: break
                if (zipEntry.name.contains("AndroidManifest.xml")) {
                    Log.d("AppLog", "zipEntry:$zipEntry ${zipEntry.size}")
                    val bytes = zipInputStream.readBytes()
                    //taken from AbstractApkFile
                    val xmlTranslator = XmlTranslator()
                    val resourceTable = ResourceTable()
                    val locale = Locale.getDefault()
                    val apkTranslator = ApkMetaTranslator(resourceTable, locale)
                    val xmlStreamer = CompositeXmlStreamer(xmlTranslator, apkTranslator)
                    val buffer = ByteBuffer.wrap(bytes)
                    val binaryXmlParser = BinaryXmlParser(buffer, resourceTable)
                    binaryXmlParser.locale = locale
                    binaryXmlParser.xmlStreamer = xmlStreamer
                    binaryXmlParser.parse()
                    val apkMeta = apkTranslator.getApkMeta();
                    Log.d("AppLog", "apkMeta:$apkMeta")
                    break
                }
            }
        }

But later I think it's a bit harder to get just the name of the app and on another function to get its icons...

EDIT: tested it on all APK files that are installed on the device, and sadly it still takes a huge amount of time, this time even more than before (39384ms) , for some reason...
How could it be?

Attached here a sample project.

My Application.zip

packageManager.getInstalledPackages read from a cached data, see
https://android.googlesource.com/platform/frameworks/base/+/483f3b06ea84440a082e21b68ec2c2e54046f5a6/services/java/com/android/server/pm/PackageManagerService.java#2735
The ApkFile implementation use Zipfile, maybe it' faster than ZipInputStream

Makes sense.
However, does it even get the cache working if I use getPackageArchiveInfo :
https://developer.android.com/reference/android/content/pm/PackageManager.html#getPackageArchiveInfo(java.lang.String,%20int)

?

I've tested now. Still takes less time. Took me 7 seconds now, using 0 as flags.

Attached new sample project.

My Application.zip

I could instead get some APK files and parse them using both methods (the framework and this library).
I used what's installed because those are a lot.

Is it possible perhaps to tell the library which types of data we wish to get?
For me, for example, I only need :

  1. package name
  2. app name
  3. app icon
  4. version code
  5. version name

I don't need all the extra things. Of course other developers might need, so I think it's best to tell the library what we need, and it will try to perform the minimal amount of parsing to get what was requested.

AndroidDeveloperLB Try the following in your project here. It will save scanning the zip files and should be faster.

private fun getBasicApkMeta(apkFilePath: String): ApkMeta? {
    val zipFile = ZipFile(File(apkFilePath))
    val entry = zipFile.getEntry("AndroidManifest.xml")
    val zipInputStream = zipFile.getInputStream(entry)

    try {
        val bytes = zipInputStream.readBytes()
        //taken from AbstractApkFile
        val xmlTranslator = XmlTranslator()
        val resourceTable = ResourceTable()
        val locale = Locale.getDefault()
        val apkTranslator = ApkMetaTranslator(resourceTable, locale)
        val xmlStreamer = CompositeXmlStreamer(xmlTranslator, apkTranslator)
        val buffer = ByteBuffer.wrap(bytes)
        val binaryXmlParser = BinaryXmlParser(buffer, resourceTable)
        binaryXmlParser.locale = locale
        binaryXmlParser.xmlStreamer = xmlStreamer
        binaryXmlParser.parse()
        val apkMeta = apkTranslator.apkMeta;
        Log.d("AppLog", "packageName=${apkMeta.packageName}")
        return apkMeta
    } finally {
        zipInputStream.close()
    }
}

Using ZipFile is actually faster than the Android Framework:
getPackageArchiveInfo: 6589ms
getBasicApkMeta: 5329ms

But there are still some issues here:

  1. The reason I used ZipInputStream instead of ZipFile (and the reason I even reached this repository and requested using InputStream here), is that Google plans to ruin the storage permission (and replace with SAF API which doesn't provide a real path you can use, but provides inputStream instead), so you can't use file-path or File API, and this means you can't use ZipFile either. There are already some articles about it, and some threads on the issue tracker. Example:
    https://issuetracker.google.com/issues/128591846
    Even finding files is currently very slow:
    https://issuetracker.google.com/issues/130261278

  2. Sadly still unable to get some APK data using this method : app name, app icon.
    But it gets the basic stuff that is in the manifest, which is nice.

  3. The test I've made includes the fetching of the app name. Somehow even when I remove this part, it's about the same in terms of time taken. I don't know how it's possible. Maybe it's cached somehow while parsing anyway and just has the answer right away for me to use. Anyway, I don't get the app-name using the function I've made, and I don't know how I can do it without making it load too much data from the stream. Just the string for the app name is needed, and just the app-icon.

Do you know how to overcome those issues ?

1. The reason I used ZipInputStream instead of ZipFile...

The code below extracts data from the APK using the URI returned from SAF and is not dependent upon real paths. I believe that this code will give you access to 100% of the apk. (Tested on API 29 emulator with targetSdkVersion = 29.)

2. Sadly still unable to get some APK data using this method...
3. The test I've made includes the fetching of the app name...

I assume all the information you need is encoded into the apk file in some way. Unless the information is readily available, like some information from the manifest, you may need to look at the framework code to understand how to get to it. Fortunately, it looks like you need just a small amount of data.

private fun showZipEntries(uri: Uri) { // uri from Storage Access Framework
    /*  Using Zipfile from Apache Commons
            implementation 'org.apache.commons:commons-compress:1.18'
        as mentioned in comment#3 of https://issuetracker.google.com/issues/130494105
        Code immediately below is also from that issue tracker comment.

        Also using ZipArchiveEntry from this library instead of ZipEntry from the standard library.
    */
    val parcelFileDescriptor: ParcelFileDescriptor? = contentResolver.openFileDescriptor(uri, "r")
    val autoCloseInputStream = ParcelFileDescriptor.AutoCloseInputStream(parcelFileDescriptor)
    val zipFileChannel = autoCloseInputStream.channel
    val zipFile = ZipFile(zipFileChannel)
    val zipEntry: ZipArchiveEntry = zipFile.getEntry("AndroidManifest.xml")
    val zipStream = zipFile.getInputStream(zipEntry)
    Log.d(
        "AppLog",
        "Zip entry: ${zipEntry.name} size = ${zipEntry.size} compressed size=${zipEntry.compressedSize}"
    )
    // Get the uncompressed manifest into a byte array.
    val bytes = zipStream.readBytes()
    //taken from AbstractApkFile
    val xmlTranslator = XmlTranslator()
    val resourceTable = ResourceTable()
    val locale = Locale.getDefault()
    val apkTranslator = ApkMetaTranslator(resourceTable, locale)
    val xmlStreamer = CompositeXmlStreamer(xmlTranslator, apkTranslator)
    val buffer = ByteBuffer.wrap(bytes)
    val binaryXmlParser = BinaryXmlParser(buffer, resourceTable)
    binaryXmlParser.locale = locale
    binaryXmlParser.xmlStreamer = xmlStreamer
    binaryXmlParser.parse()

    // Not really doing anything with this now.
    val apkMeta = apkTranslator.apkMeta
    Log.d("AppLog", "apkMeta: $apkMeta")

    // Read out all apk entries to show that we have access. We should be able to fetch uncompressed byte array
    // representations from any of these entries.
    for (zipEntry in zipFile.entries) {
        Log.d("AppLog", "Zip entry: ${zipEntry.name}")
    }

    zipStream.close()
}

Can you please share the full project, and show how well it works when going over all installed apps (to see it in action, compared to the normal getPackageArchiveInfo function?

Do you think it's a lot of work to get the app-icon and app-name? Remember that in some cases, the app-icon is not a simple image, but it can be an adaptive-icon (2 layers of images), or a VectorDrawable or both (adaptive-icon of one or two VectorDrawable )...
Maybe app-name is ok, but I think it could be hard to get the app-icon...

If you have a good sample, which includes all of these, I have a bounty for you:

https://stackoverflow.com/questions/56309165/how-to-get-information-of-an-apk-file-without-using-file-or-file-path

@AndroidDeveloperLB I don't have the code that will use SAF to get a URI for the APK of each installed app. I don't see it in any of your samples. If you can post that, I will take a crack something that will parse the manifest from each APK for comparison.

Even with access to raw APK files, your work is going to be cut out for you. What I presented will only get you the raw APK data through a URI supplied by SAF. I took a quick look at extracting even the app name from the APK and it won't be easy IMHO.

This whole SAF thing looks to me like a disaster in the making. Maybe Google can pull a rabbit out of the hat for this one.

I didn't mean that it will handle a Uri. You can use anything you wish, as long as it's compatible with SAF, and won't cause issues (such as OOM).
InputStream is one way.
Can you show any sample, at all?

As for how bad SAF and scoped storage are, you are correct. There are already articles about it and the issue tracker also has various issues about it. Here's one article, made by a file manager app :
https://www.xda-developers.com/android-q-storage-access-framework-scoped-storage/

And you know of Commonsware? He decided to write so many articles about it, trying to help developers about it when possible:
https://issuetracker.google.com/issues/128591846#comment196
But even he agrees : this decision is one that breaks tons of things that were possible before, and now will only be possible by either more hacks or performance and storage hit (because you will copy files just to be able to handle their content).

Is it possible perhaps to grab the part in Android's code to do this, and that's it? Or even a hack to use it?
Or is it a huge amount of work?

I saw the coverage of SAF and read CommonsWare's blog about it. It is a general headache, but you may be able to get around it (still a headache, though.)

Here is a short app that dumps to logcat some the information that you are looking for. It uses the package name to get the package info. I am unclear about your exact needs but, if you have URIs from SAF, you may be able to match those URIs up with what this little app can do - maybe by matching APK signatures, APK location, etc.

APK Reader.zip

Is it possible perhaps to grab the part in Android's code to do this, and that's it? Or even a hack to use it?
Or is it a huge amount of work?

My sense is a huge amount of work and I would always wonder if I did everything right. Personally, I would spend time trying to leverage the system but that is me.

Sadly using what the system has (currently) would work very badly in terms of performance, because it will require me to copy the APK file before I can read it. So if I go over multiple files (especially large ones), it's just a waste of time and space.

My app (here: https://play.google.com/store/apps/details?id=com.lb.app_manager ) searches for all APK files on the storage of the device, and shows information about each, while also allowing to install, delete, and share them.

Your sample doesn't work as I wrote. It just takes the already known information about the installed APK files. My question was about APK files in general, so it should work even for non-installed APK files, and it should work with SAF too.
This means you can't use getPackageInfo , because it's only for installed apps.
And it means you can't use getPackageArchiveInfo either, because it includes a file path, and you don't have access to a the file using a file path anymore (at least currently that's how SAF works).

So assume you have a Uri and/or inputStream to parse the APK file. You could have the inputStream again and again, of course, and you shouldn't copy the file either (because that's not really parsing of the original file, just a copy of it, and it wastes time and space), and not load it entirely into memory (because of possible OOM)

I think latest versions of this library are actually good in terms of performance. Closing this.