CensoredUsername/unrpyc

Pickle error with __builtins__.getattr

Gouvernathor opened this issue · 16 comments

Traceback (most recent call last):
  File "<string>", line 15, in decompile_game
  File "<string>", line 7, in decompile_rpyc
  File "<string>", line 2, in read_ast_from_file
  File "<string>", line 59, in pickle_safe_loads
  File "<string>", line 152, in safe_loads
  File "lib/python3.9/pickle.py", line 1212, in load
  File "lib/python3.9/pickle.py", line 1589, in load_reduce
  File "<string>", line 26, in __new__
decompiler.magic.FakeUnpicklingError: <class '__builtin__.getattr'> was instantiated with unexpected arguments (<class 'parse_utilities.node.Node'>, '_from_reduce'), {}

This seems weird : since getattr is a function and not a class, why does it go through the pickle process at all ?

Huh.

That is a weird one. So what you're seeing there is that the pickle is trying to do some kind of code injection during unpickling. Python is fine with allowing that, but unrpyc isn't so we try to defend against that.

Normally I'd assume something like this to happen if someone tries to unrpyc un.rpyc, since it uses the same trick to bootstrap itself. But I don't recognize this code.

It's trying to do getattr(parse_utilities.node.Node, "_from_reduce") during unpickle time, probably then to run parse_utilities.node.Node._from_reduce on some other data. that's likely some custom module as I do not know a package or ren'py module called parse_utilities.

My guess is someone trying to defend against decompilation. Not much more I can say without more info.

edit: some googling tells me this is likely Innocent Witches. Not the first time seeing that game here. It does some pretty significant engine modifications.

It is from that game, but the issue didn't seem to obviously be related to that.
A weird thing is that nothing in renpy involves a _from_reduce attribute or method so that's weird.

Another strange thing is that when I tried using the un.rpyc, there was this error (without the whole traceback) and some other due to RawRepeat and RawChoice SL nodes not being found, but when using the CLI only this error appears.

A weird thing is that nothing in renpy involves a _from_reduce attribute or method so that's weird.

It's from the parse_utilities python module that Innocent Witches ships with.

Another strange thing is that when I tried using the un.rpyc, there was this error (without the whole traceback) and some other due to RawRepeat and RawChoice SL nodes not being found, but when using the CLI only this error appears.

error reporting from inside un.rpyc is a bit more minimal. Regaring those missing errors, I think because they're not in the renpy namespace, and the classes are likely already loaded by ren'py there's no issue with importing them.

So, it makes sense that the CLI doesn't find a class that's defined in .py files in the game. However when the game is launched the python module is imported and loaded, so the un.rpyc's pickle should find the it loaded, not use a magic mock for the Node class, and find its _from_reduce method, right ?

From the top of my head, to provide a little interactivity and debugging ability, could it be possible to have the game launch normally and then use unrpyc features from the game's console ? Like an un.rpyc that would inject itself but not launch itself on its own.

So, it makes sense that the CLI doesn't find a class that's defined in .py files in the game. However when the game is launched the python module is imported and loaded, so the un.rpyc's pickle should find the it loaded, not use a magic mock for the Node class, and find its _from_reduce method, right ?

un.rpyc reuses the same safe depickling method even inside un.rpyc, so foreign code injection like what it's doing should also be impossible.

From the top of my head, to provide a little interactivity and debugging ability, could it be possible to have the game launch normally and then use unrpyc features from the game's console ? Like an un.rpyc that would inject itself but not launch itself on its own.

Right now un.rpyc works via tearing down the entire renpy environment, doing the decompilation, and restoring it again. This is done so we can still refer inside unrpyc to things that don't exist in current renpy for backwards compatibility reasons. The whole tearing down the environment isn't exactly threadsafe, so doing this at runtime would not work. At that point it'd be needed to just spawn a unrpyc subprocess from inside the game, and have that do the decompilation. Possible, but a bit of work to get working.

So, if I understand correctly, the "safe depickling" mechanism is the very reason why it fails, since it purposefully deprives itself of the necessary data which must be loaded for the unpickling to actually work...
I suppose there are reasons why you need some mock at some point, but why not take the classes and data that exist when they do, and use the mock only as a fallback ? And if that throws some backwards-compatibility away, make it a togglable option. But from what I understand, in this particular case the safe-pickle and mock practice seems to work as an obstacle rather than a help.

And the problem seems to be even weirder, since the "fake" unpickler seems not only unable to find the parse_utilities.node.Node class, but actually it's unable to find the getattr built-in function and tries to replace it with a mock class...

Turns out I made it to work, by doign roughly just that.

In compile.py, in the code that ends up in un.rpy :

+   sys.modules["__builtin__"] = sys.modules["builtins"]
+   import builtins
+   builtins.unicode = str

    # ?????????
    global unrpyc
    unrpyc = pickle.loads(base64.b64decode({}))
    unrpyc.decompile_game()

in renpycompat.py, in place of the definition of pickle_safe_loads and replacing the passed set of safe modules :

import sys
def pickle_safe_loads(buffer: bytes):
    return magic.safe_loads(
        buffer, CLASS_FACTORY,
        {m for m in sys.modules.keys() if "renpy" not in m},
        encoding="ASCII", errors="strict")

I'm not too sure about the first part as it seems quite a dirty hack, maybe it's due to specific hideous pickle stuff found in IW's rpyc files. But for the second part, I think it's probably worth including in general.

The thing I'm mostly uncomfortable with is that this allows code injection into unrpyc. Now of course, when you're already running an ren'py game this isn't the biggest issue, but this was a core design constraint with base unrpyc, and could cause for some very hard to diagnose errors if people actually mess with it.

I'd rather just we tell people "well this means the game is doing code injection into its own save files?"

What risk would be introduced by allowing this, that wouldn't be introduced by running the game with the same rpyc files in the first place ?

I can sort of understand the idea that the CLI use can be supposed to be safe from code injection, but there could be an option, disabled by default and with warnings about it in the readme, to enable unsafe mode... although in CLI mode the rpyc's early code is not actually executed so one rpyc can't use what its neighbor prepares. So it wouldn't be very useful anyway (as far as I understand what goes on when using the CLI), let's forget enabling that for the CLI.

But in injection mode, when using un.rpy or un.rpyc or the rpyb injection, then the game will be booted anyway, so there's no point in being cautious.
I think we could enable all modules but renpy internals in injection mode and disable them in CLI mode. Am I missing something ?

I just realized I never replied to this but regarding the earlier:

I'm not too sure about the first part as it seems quite a dirty hack

That is a python 2-3 compat thingy. Renpy still saves at pickle protocol 2 which enables import fixing for python 2 compatibility, even with python 3. That said, this is usually never relevant as only ren'py structures and python builtins end up in the pickle. Look up fix_imports in the python pickle docs for more info.

I think we could enable all modules but renpy internals in injection mode and disable them in CLI mode. Am I missing something ?

I'm just really conflicted about it. It's an easy change, but on the other side:

  • this is definitely not an intended feature of the normal rpyc format
  • there's no good reason for anything to be doing this?
  • as this allows for arbitrary code execution it's impossible to say what the impact of this would be.
  • even if we run that code, it'll likely not run in the correct context (the renpy modules and objects we provide are totally fake and will be missing many expected attributes and functions).
  • allowing this will remove the big warning that something is afoot, and potentially lead to many confusing bug reports.

My opinion is that it's just better to have a big warning label of "yo this is doing something that we don't know, and you should definitely check out what is happening before proceeding" than to just run with it and hope that it'll run okay. I can understand an option for allowing it, but don't think I'd make it the default.

The trouble is that a CLI option is pointless since the CLI has no access to the necessary classes and functions, and you can't add an option to an opaque rpyc or rpyb file for the injection mode...
Maybe as a commented-out conditional path in un.rpy ? That way if you're stuck like I was, the option is there and you can just uncomment it and recompile. Not everybody will be able to do that, but it doesn't require figuring out things or writing new code.

Yeah, not being able to give options to un.rpyc is limiting in that aspect. Not sure how to change that nicely. this logic is normally in the compressed blob part of un.rpy due to needing to package multiple files in there.

We could create an alternate un.rpy that would allow for this. But it'd be useful only for this exact game, so then the question is more where to store it. We used to occasionally do branches for games that needed "special" treatment, but honestly I don't have much interest in supporting more than the general ren'py format so I stopped doing that. If people want to mod a game that does weird things with the engine, they're going to need the knowledge to do so anyway so then they can handle that themselves.

I don't think it's that much specific to that game. It doesn't require any specific class or function name, for example, that would be defined in a game's code.
Couldn't it just come from a heavy-handed use of advanced custom statements and displayables ? That would explain why not many other games got blocked by this - or were popular enough for someone to try to decompile them and ask here about it. And in that case it would just be a nominal extension required to fully support everything renpy allows.

Couldn't it just come from a heavy-handed use of advanced custom statements and displayables ?

To do this you need to mess with the resulting output pickle, or write something explicitly abusing the pickling machinery like this:

class PickleHack:
    def myfunction(self):
        pass

    def __reduce__(self):
        return getattr, (PickleHack, "myfunction"), {}

Running this:

>>> pickle.loads(pickle.dumps(PickleHack()))
<function PickleHack.myfunction at 0x000001AE09ACA200>

And then getting these objects somewhere into the stored AST to run them. Which you'd achieve via defining some kind of custom AST node likely. But what the point of this would be, I have no idea. It also doesn't roundtrip cleanly.

Again, the only point of this is to be able to run code at unpickling time, before python early.

I don't think it's that much specific to that game. It doesn't require any specific class or function name, for example, that would be defined in a game's code.

I'm wondering how it is injecting them then then. Might be that the custom AST nodes are defined in the python files instead of the rpyc files.

That would explain why not many other games got blocked by this - or were popular enough for someone to try to decompile them and ask here about it.

I'd posit a different explanation: this is likely the first game which did this to begin with, because it's such a weird thing to do when you could just modify the engine to add a hook in the rpyc loading code. That'd also come at the benefit of having actual error handling available. This won't. Believe me, I use exactly the same trick to inject un.rpyc code to achieve the same effect, and debugging that when errors happen before we set up proper error handling in there is an absolute pain.

The only benefit I see from this is that you can hide the running of this code from decompilers like unrpyc, if they didn't detect it like we do now. Which I'd rather not have happen. Better to alert people that something is afoot that is more than unrpyc knows about than to just silently say that everything is alright while it's not.