weka/easypy

TypedStruct - schematic classes for structured people

Closed this issue · 3 comments

We want a schema for our configuration - a declarative way to define what fields it should have and all sorts of info on these fields. We also want to be able to reflectively use these schemas - create Plumbum CLI flags, store the info in databases, send it to external services(like AWS) etc.

Goals:

  • Allow creating types with lists of fields.
    • Automatic c'tor for these fields.
    • Getters and setters should work.
  • Allow declaring the types of the fields.
    • Types should be verified at runtime, in the c'tor and in the setters.
  • Allow declaring constraints on the fields(e.g. - "number can be from 13 to 42")
    • Of course, constraints should also be verified at runtime.
  • Allow custom metadata - e.g. help text for Plumbum CLI.
  • Allow default value for fields. Fields without default values are mandatory.
  • Allow nesting of types.
  • Should work well with IDEs' autocompletion.
  • Nice-to-have: integration with MyPy. Not sure if it's even possible though - certainly not without Python 3.6's variable annotation...

I'm aiming for syntax along the lines of:

import easypy.typed_struct as ts

class Type1(ts.TypedStruct):
    field1 = ts.Field(int,  # the type of the field
                      default=20,  # default value - makes the field optional
                      validate=lambda x: x % 2 == 0,  # asserts that the field value is even
                      meta=dict(  # metadata - not used by TypedStruct but can be queried
                          # We can use these when we generate plumbum.cli.SwitchAttr:
                          cli_syntax='--field1',
                          cli_help='Bla bla bla',
                      ))
    field2 = ts.Field(str,
                      # no default - this field is mandatory

                      # alternative syntax for custom metadata - need to
                      # choose which one we want
                      meta_cli_help='Bla bla bla')


class Type2(ts.TypedStruct):
    # Since Type1 is a TypedStruct, we can use it as a field. This is good for
    # IDEs to be able to provide completions:
    field1 = Type1

    # Array of Type1. Without Python 3.6 variable annotations, this is the only
    # way we can get completions from IDEs.
    field2 = [Type1]

    # Dictionary of Type1. Again - this is the only way to get completions.
    field3 = {str: Type1}


var1 = Type1(field2='x')  # field2 is mandatory - no need to specify field1
var2 = Type1(field1=5, field2='y')  # should throw on field1 validation
var3 = Type2(field1=dict(field2='z'))   # automatically create Type1 from the dict
  • Validate should be able to accept a list/tuple of predicates, and we may also provide some common validators(is_in_range, is_part_of etc)
  • We may want to automatically use a Bunch for things likeType2.field3(dictionaries where keys are strings). We can't use Bunch itself in the definition if we want IDE completions.
  • Type2.field1 is mandatory because Type1 has mandatory fields(Type1() is illegal). If Type1.field2 was not mandatory we Type2.field1 was not mandatory as well and Type2() was legal.

@koreno @tigrawap

should it be:

class Type2(TypedStruct):
    field1 = Field(Type1, meta=dict(...))

?

field1 = Type1 seems like sugar that might be more confusing the helpful.
Or are you accepting of field1 = int as well?

Without Python 3.6's variable annotations, this is the only way to get completions. Unless we want to repeat the list of fields.

With

class Type2(ts.TypedStruct):
    field1 = Type1

var = Type2(...)
var.field1.

With the caret at the end, IDEs(and completion libraries like Jedi) believe that var.field1 is Type1. Not an instance of Type1, but the class Type1 itself. This may sound wrong, but it's good enough for what we need - because Type1 is also a TypedStruct and it's fields are defined at the class' body, not at __init__. So there are Type1.field1 and Type1.field2(not just Type1().field1 and Type1().field2) and the IDE is able to complete var.field1.field1 and var.field1.field2.

With

class Type2(ts.TypedStruct):
    field1 = ts.Field(Type1, ,,,)

var = Type2(...)
var.field1.

The IDE thinks that var.field1 is Field - it can not deduce that the first argument of Field's constructor is the type of var.field1. This means we can't get completions for it's fields.

If this was Python 3.6, we could have used:

class Type2(ts.TypedStruct):
    field1: Type1 = ts.Field(default=..., meta=dict(...))
    field2: int = ts.Field(default=..., meta=dict(...))

As for accepting field1 = int, I would prefer not to do it for the following reasons:

  • We usually need the Field for all the other data(default value, validators, custom meta...). TypedStruct fields tend to contain their own metadata in their type definition.
  • It's not that bad that we don't get completions for fields of primitive types.
  • I can easily identify Field and TypedStruct fields and say they are the typed struct's data fields. With other types the rules will be to inclusive - do I allow all types? So if someone imports a type in the body of the class I'll consider it a data field?