A small compiler that can convert Python scripts to pickle bytecode.
- Python 3.8+
No third-party modules are required.
usage: pickora.py [-h] [-d] [-r] [-l {none,python,pickle}] [-o OUTPUT] file
A toy compiler that can convert Python scripts to pickle bytecode.
positional arguments:
file the Python script to compile
optional arguments:
-h, --help show this help message and exit
-d, --dis disassamble compiled pickle bytecode
-r, --eval, --run run the pickle bytecode
-l {none,python,pickle}, --lambda {none,python,pickle}
choose lambda compiling mode
-o OUTPUT, --output OUTPUT
write compiled pickle to file
Note: Lambda syntax is disabled (
--lambda=none
) by default.
$ python3 pickora.py --dis samples/hello.py --output output.pkl
0: \x80 PROTO 4
2: \x95 FRAME 99
...
108: N NONE
109: . STOP
highest protocol among opcodes = 4
$ python3 -m pickle output.pkl
===================
| Hello, world! 🐱 |
===================
None
In this example, we compiled samples/hello.py
to output.pkl
and show the disassembled result of the compiled pickle bytecode.
But note that this won't run the pickle for you. If you want to do so, add -r
option or execute python -m pickle output.pkl
as in this example.
- Literal: int, float, bytes, string, dict, list, set, tuple, bool, None
- Assignment:
val = dict_['x'] = obj.attr = 'meow'
(directly using bytecode for all of these operation) - Attributes:
obj.attr
(usingbuiltins.getattr
only when you need to "load" an attribute) - Named assignment:
(x := 0xff)
- Function call:
f(arg1, arg2)
- Doesn't support keyword argument.
- Operators (using
operator
module)- Binary operators:
+
,-
,*
,/
etc. - Unary operators:
not
,~
,+val
,-val
- Compare:
0 < 3 > 2 == 2 > 1
(usingbuiltins.all
for chained comparing) - Subscript:
list_[1:3]
,dict_['key']
(usingbuiltins.slice
for slice) - Boolean operators (using
builtins.next
,builtins.filter
)- and: using
operator.not_
- or: using
operator.truth
(a or b or c)
->next(filter(truth, (a, b, c)), c)
(a and b and c)
->next(filter(not_, (a, b, c)), c)
- and: using
- Binary operators:
- Import
import module
(usingbuiltins.__import__
)from module import things
(directly usingSTACK_GLOBALS
bytecode)
- Lambda
lambda x,y=1: x+y
- Using
types.CodeType
andtypes.FunctionType
- Disabled by default
- [Known bug] If any global variables are changed after the lambda definition, the lambda function won't see those changes.
RETURN
is a keyword reserved for specifying pickle.load(s)
result. This keyword should only be put in the last statement alone, and you can assign any value / expression to it.
For example, after you compile the following code and use pickle.loads
to load the compiled pickle, it returns a string 'INT_MAX=2147483647'
.
# source.py
n = pow(2, 31) - 1
RETURN = "INT_MAX=%d" % n
It might look like this:
$ python3 pickora.py source.py -o output.pkl
Saving pickle to output.pkl
$ python3 -m pickle output.pkl
'INT_MAX=2147483647'
There are currently 3 macros available: STACK_GLOBAL
, GLOBAL
and INST
.
Example:
function_name = input("> ") # > system
func = STACK_GLOBAL('os', function_name) # <built-in function system>
func("date") # Tue Jan 13 33:33:37 UTC 2077
Behaviour:
- PUSH modname
- PUSH name
- STACK_GLOBAL
Example:
func = GLOBAL("os", "system") # <built-in function system>
func("date") # Tue Jan 13 33:33:37 UTC 2077
Behaviour:
Simply run this piece of bytecode: f"c{modname}\n{name}\n"
Example:
command = input("cmd> ") # cmd> date
INST("os", "system", (command,)) # Tue Jan 13 33:33:37 UTC 2077
Behaviour:
- PUSH a MARK
- PUSH
args
by order - Run this piece of bytecode:
f'i{modname}\n{name}\n'
- Operators (
compare,unary,binary,subscript) - Unpacking assignment
- Augmented assignment
- Macros
- Lambda (I don't want to support normal function, because it seems not "picklic" for me)
- Python bytecode mode
- Pickle bytecode mode
- Function call with kwargs
NEWOBJ_EX
only support type object (it calls__new__
)
RTFM.
It's cool.
No, not at all, it's definitely useless.
Yep, it's cool garbage.
No. All pickle can do is just simply define a variable or call a function, so this kind of syntax wouldn't exist.
But if you want to do things like:
ans = input("Yes/No: ")
if ans == 'Yes':
print("Great!")
elif ans == 'No':
exit()
It's still achievable! You can rewrite your code like this:
from functools import partial
condition = {'Yes': partial(print, 'Great!'), 'No': exit}
ans = input("Yes/No: ")
condition.get(ans, repr)()
ta-da!
For the loop syntax, you can try to use map
/ starmap
/ reduce
etc .
And yes, you are right, it's functional programming time!