appknox/pyaxmlparser

Unexpected behaviour is observed if a large APK file is loaded

mykola-mokhnach opened this issue · 2 comments

Currently the lib always tries to fully load the content of an APK file into the memory because of

self.__raw = bytearray(filename)
/ https://github.com/appknox/pyaxmlparser/blob/master/pyaxmlparser/core.py#L102, which is slow and dangerous in case of larger package files. In k8s environment the supervisor simply kills the POD running the script because of unacceptably high memory usage while working with bigger APKs (some games might be 1GB and more in size because of resources).

Is there a way to lower the overall memory usage and prevent the full content of the source APK to be loaded into RAM?

You can unzip the APK and read the manifest file by:

from pyaxmlparser.utils import read

ap = AXMLPrinter(read("path/to/manifest/file.xml"))
if not ap.is_valid():
    log.error("Error while parsing AndroidManifest.xml - is the file valid?")
    exit()
manifest_xml = ap.get_xml_obj()
print("Package Name: ", manifest_xml.findall(".//manifest")[0].get("package")

You can individually use the AXMLParser and AXMLPrinter to parse things. As a package, we will load the APK because we wanted to provide relevant information when using pyaxmlparser which includes the icons, certificate information and other things which we have to process and cross-reference it in memory rather than storing in temporary file-system.

I'm okay if you can send a PR that can have an optional switch to not load the APK in memory instead operate it in a temporary file.

Thanks for the example. We actually need to parse more info from the package, not just just the manifest. That is why we use the APK class. It seems like the self.__raw property is only needed for pickling, which could be completely optional. Also the zipfile.ZipFile constructor can accept normal file paths.

For now we've overridden the instance constructor to avoid initialising the __raw property, but this is more like a dirty workaround rather than a proper solution