MatthieuDartiailh/bytecode

Discussion : Improving handling of co_flags

Closed this issue · 14 comments

Currently the flags for a code object need to be specified manually as a single integer. This approach offers the maximum flexibility but is also error prone as the flags and could I believe be improved. Follow are the 'official' flags (ie excluding future flags such as CO_FUTURE_BARRY_AS_BDFL and CO_FUTURE_GENERATOR_STOP).

From dis.COMPILER_FLAG_NAMES

 1 OPTIMIZED
  2 NEWLOCALS
  4 VARARGS
  8 VARKEYWORDS
 16 NESTED
 32 GENERATOR
 64 NOFREE
128 COROUTINE
256 ITERABLE_COROUTINE

Among those we can identify kind of three families (please correct me if I got this wrong):

  • the flags completely independent of the underlying code : NEWLOCALS, VARARGS, VARKEYWORDS, ITERABLE_COROUTINE
  • the flags completely dependent on the underlying code: OPTIMIZED, NO_FREE, GENERATOR
  • and in between flags:
    • NESTED: apply only to function like code defined in another function (and honestly I have difficulty understanding what it does...)
    • COROUTINE: can be obvious if GET_AWAITABLE is used but it is not always so

Furthermore COROUTINE and ITERABLE_COROUTINE are incompatible.

I hence believe that it would be profitable to be able to:

  • specify manually the value for the first kind through a higher level construct
  • have the proper value be computed in an automatic fashion for the second kind
  • be able to specify if the code is nested (byteplay does this by passing a keyword arg to to_code) and if the code is from a function (byteplay only guess here).
  • be able to force the coroutine behavior or have it inferred.

Looking at how byteplay handles this, there is a number of attribute on the code object itself allowing to specify the flags (and the from_function keyword arg in to_code). Moreover generator behavior can be forced.

I do not have a specific implementation in mind but I think that keeping the flag logic in a separate class differentiating between default values (from the original code or guessed) and forced user value may help. The conversion to an int would obsiously require the code.

What do people think ?

I propose to experiment creating a new bytecode.Flags class.

Maybe bytecode could accept both Flags and int types for flags (convert int to Flags implicitly)?

Why not use enum.IntFlags rather of enum.IntEnum?

Because IntFlags is new in 3.6

Maybe import external enum implementation in 3.5 and earlier?

I would like @Haypo point of view before adding an external dependency and one that would be version dependent.

Sorry, I don't know well enum.IntFlag. Can you please show some examples compared to Matthieu's class?

It is similar to IntEnum, but supports combinations. It is just a bit mask with fanny representation.

>>> import enum
>>> class Flags(enum.IntFlag):
...     CO_OPTIMIZED             = 0x00001  # noqa
...     CO_NEWLOCALS             = 0x00002  # noqa
...     CO_VARARGS               = 0x00004  # noqa
... 
>>> Flags.CO_OPTIMIZED|Flags.CO_NEWLOCALS
<Flags.CO_NEWLOCALS|CO_OPTIMIZED: 3>
>>> print(Flags.CO_OPTIMIZED|Flags.CO_NEWLOCALS)
Flags.CO_NEWLOCALS|CO_OPTIMIZED
>>> int(Flags.CO_OPTIMIZED|Flags.CO_NEWLOCALS)
3
>>> Flags(3)
<Flags.CO_NEWLOCALS|CO_OPTIMIZED: 3>
>>> (Flags.CO_OPTIMIZED|Flags.CO_NEWLOCALS) & Flags.CO_NEWLOCALS
<Flags.CO_NEWLOCALS: 2>
>>> (Flags.CO_OPTIMIZED|Flags.CO_NEWLOCALS) & ~Flags.CO_NEWLOCALS
<Flags.CO_OPTIMIZED: 1>

If we use enum.IntFlag, we need to write Matthieu logic to compute "implicit flags" someone else, near ConcreteBytecode.to_code().

How do you represent "no flag" (flags=0) using IntFlag?

You can just use Flags(0). Or add explicit name for value 0.

>>> class Flags(enum.IntFlag):
...     NO_FLAGS = 0
...     CO_OPTIMIZED = 1
...     CO_NEWLOCALS = 2
... 
>>> Flags(0)
<Flags.NO_FLAGS: 0>
>>> print(Flags(0))
Flags.NO_FLAGS
>>> Flags.NO_FLAGS|Flags.CO_OPTIMIZED
<Flags.CO_OPTIMIZED: 1>

The issue I see is that there will be no way to let the user override the flag inference in such a case (what the _forced dict is currently used for).

I addressed some of the comments in #20 but we need to decide on a representation of the flags before I complete. If we go with IntFlags we need a third party library to provide it for python < 3.6 (or rewrite it), one possible way to handle the inference would be a helper function returning a new flag from a flag and a bytecode (can generalize to non concrete I think) and it is then up to the user to update the flags after modifying the bytecode. Otherwise we keep the two classes we have and I just need to address the last comments.
After thinking a bit about it, I think using IntFlags would probably lead to the less confusing situation (the None/True/False vales of inferable flags is not really obvious). Once we settle on a provider of IntFlags I can update my patch.

Sorry, I don't have a strong opinion on the flags API. I let you decide with @serhiy-storchaka :-)

@serhiy-storchaka are you aware of any backport of python 3.6 enum ? Otherwise I fear I will have to copy-paste it from CPython, which is not ideal.

Closed by #23