Pystachio is a type-checked dictionary templating library.
Its primary use is for the construction of miniature domain-specific configuration languages. Schemas defined by Pystachio can themselves be serialized and reconstructed into other Python interpreters. Pystachio objects are tailored via Mustache templates, as explained in the section on templating.
This project is unrelated to the defunct Javascript Python interpreter.
Notable related projects:
- dictshield
- remoteobjects
- Django's model.Model
Tested and works in CPython 2.6.7, 2.7.2, 3.2.1 and PyPy 1.6. Not tested pre-2.6.x and will almost certainly not work. Due to http://bugs.python.org/issue2646, serialization to/from json will likely break CPython 2.6.1 and earlier because unicode kwargs keys are not supported.
You can define a structured type through the 'Struct' type:
from pystachio import (
Integer,
String,
Struct)
class Employee(Struct):
first = String
last = String
age = Integer
By default all fields are optional:
>>> Employee().check()
TypeCheck(OK)
>>> Employee(first = 'brian')
Employee(first=brian)
>>> Employee(first = 'brian').check()
TypeCheck(OK)
But it is possible to make certain fields required:
from pystachio import Required
class Employee(Struct):
first = Required(String)
last = Required(String)
age = Integer
We can still instantiate objects with empty fields:
>>> Employee()
Employee()
But they will fail type checks:
>>> Employee().check()
TypeCheck(FAILED): Employee[last] is required.
Struct objects are purely functional and hence immutable after constructed, however they are composable like functors:
>>> brian = Employee(first = 'brian')
>>> brian(last = 'wickman')
Employee(last=wickman, first=brian)
>>> brian
Employee(first=brian)
>>> brian = brian(last='wickman')
>>> brian.check()
TypeCheck(OK)
Object fields may also acquire defaults:
class Employee(Struct):
first = Required(String)
last = Required(String)
age = Integer
location = Default(String, "San Francisco")
>>> Employee()
Employee(location=San Francisco)
Schemas wouldn't be terribly useful without the ability to be hierarchical:
class Location(Struct):
city = String
state = String
country = String
class Employee(Struct):
first = Required(String)
last = Required(String)
age = Integer
location = Default(Location, Location(city = "San Francisco"))
>>> Employee(first="brian", last="wickman")
Employee(last=wickman, location=Location(city=San Francisco), first=brian)
>>> Employee(first="brian", last="wickman").check()
TypeCheck(OK)
There are five basic types, two basic container types and then the Struct
and Choice
types.
There are five basic types: String
, Integer
, Float
, Boolean
and Enum
. The first four behave as expected:
>>> Float(1.0).check()
TypeCheck(OK)
>>> String("1.0").check()
TypeCheck(OK)
>>> Integer(1).check()
TypeCheck(OK)
>>> Boolean(False).check()
TypeCheck(OK)
They also make a best effort to coerce into the appropriate type:
>>> Float("1.0")
Float(1.0)
>>> String(1.0)
String(1.0)
>>> Integer("1")
Integer(1)
>>> Boolean("true")
Boolean(True)
Though the same gotchas apply as standard coercion in Python:
>>> int("1.0")
ValueError: invalid literal for int() with base 10: '1.0'
>>> Integer("1.0")
pystachio.objects.CoercionError: Cannot coerce '1.0' to Integer
with the exception of Boolean
which accepts "false" as falsy.
Enum is a factory that produces new enumeration types:
>>> Enum('Red', 'Green', 'Blue')
<class 'pystachio.typing.Enum_Red_Green_Blue'>
>>> Color = Enum('Red', 'Green', 'Blue')
>>> Color('Red')
Enum_Red_Green_Blue(Red)
>>> Color('Brown')
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/Users/wickman/clients/pystachio/pystachio/basic.py", line 208, in __init__
self.__class__.__name__, ', '.join(self.VALUES)))
ValueError: Enum_Red_Green_Blue only accepts the following values: Red, Green, Blue
Enums can also be constructed using namedtuple
syntax to generate more illustrative class names:
>>> Enum('Color', ('Red', 'Green', 'Blue'))
<class 'pystachio.typing.Color'>
>>> Color = Enum('Color', ('Red', 'Green', 'Blue'))
>>> Color('Red')
Color(Red)
Choice types represent alternatives - values that can have one of some set of values.
>>> C = Choice([Integer, String])
>>> c1 = C("abc")
>>> c2 = C(343)
There are two container types: the List
type and the Map
type. Lists
are parameterized by the type they contain, and Maps are parameterized from
a key type to a value type.
You construct a List
by specifying its type (it actually behaves like a
metaclass, since it produces a type):
>>> List(String)
<class 'pystachio.container.StringList'>
>>> List(String)([])
StringList()
>>> List(String)(["a", "b", "c"])
StringList(a, b, c)
They compose like expected:
>>> li = List(Integer)
>>> li
<class 'pystachio.container.IntegerList'>
>>> List(li)
<class 'pystachio.container.IntegerListList'>
>>> List(li)([li([1,"2",3]), li([' 2', '3 ', 4])])
IntegerListList(IntegerList(1, 2, 3), IntegerList(2, 3, 4))
Type checking is done recursively:
>> List(li)([li([1,"2",3]), li([' 2', '3 ', 4])]).check()
TypeCheck(OK)
You construct a Map
by specifying the source and destination types:
>>> ages = Map(String, Integer)({
... 'brian': 30,
... 'ian': 15,
... 'robey': 5000
... })
>>> ages
StringIntegerMap(brian => 28, ian => 15, robey => 5000)
>>> ages.check()
TypeCheck(OK)
Much like all other types, these types are immutable. The only way to "mutate" would be to create a whole new Map. Technically speaking these types are hashable as well, so you can construct stranger composite types (added indentation for clarity.)
>>> fake_ages = Map(String, Integer)({
... 'brian': 28,
... 'ian': 15,
... 'robey': 5000
... })
>>> real_ages = Map(String, Integer)({
... 'brian': 30,
... 'ian': 21,
... 'robey': 35
... })
>>> believability = Map(Map(String, Integer), Float)({
... fake_ages: 0.2,
... real_ages: 0.9
... })
>>> believability
StringIntegerMapFloatMap(
StringIntegerMap(brian => 28, ian => 15, robey => 5000) => 0.2,
StringIntegerMap(brian => 30, ian => 21, robey => 35) => 0.9)
Objects have "environments": a set of bound scopes that follow the Object
around. Objects are still immutable. The act of binding a variable to an
Object just creates a new object with an additional variable scope. You can
print the scopes by using the scopes
function:
>>> String("hello").scopes()
()
You can bind variables to that object with the bind
function:
>>> String("hello").bind(herp = "derp")
String(hello)
The environment variables of an object do not alter equality, for example:
>>> String("hello") == String("hello")
True
>>> String("hello").bind(foo = "bar") == String("hello")
True
The object appears to be the same but it carries that scope around with it:
>>> String("hello").bind(herp = "derp").scopes()
(Environment({Ref(herp): 'derp'}),)
Furthermore you can bind multiple times:
>>> String("hello").bind(herp = "derp").bind(herp = "extra derp").scopes()
(Environment({Ref(herp): 'extra derp'}), Environment({Ref(herp): 'derp'}))
You can use keyword arguments, but you can also pass dictionaries directly:
>>> String("hello").bind({"herp": "derp"}).scopes()
(Environment({Ref(herp): 'derp'}),)
Think of this as a "mount table" for mounting objects at particular points in a namespace. This namespace is hierarchical:
>>> String("hello").bind(herp = "derp", metaherp = {"a": 1, "b": {"c": 2}}).scopes()
(Environment({Ref(herp): 'derp', Ref(metaherp.b.c): '2', Ref(metaherp.a): '1'}),)
In fact, you can bind any Namable
object, including List
, Map
, and
Struct
types directly:
>>> class Person(Struct)
... first = String
... last = String
...
>>> String("hello").bind(Person(first="brian")).scopes()
(Person(first=brian),)
The Environment
object is simply a mechanism to bind arbitrary strings
into a namespace compatible with Namable
objects.
Because you can bind multiple times, scopes just form a name-resolution order:
>>> (String("hello").bind(Person(first="brian"), first="john")
.bind({'first': "jake"}, Person(first="jane"))).scopes()
(Person(first=jane),
Environment({Ref(first): 'jake'}),
Environment({Ref(first): 'john'}),
Person(first=brian))
The later a variable is bound, the "higher priority" its name resolution
becomes. Binding to an object is to achieve the effect of local overriding.
But you can also do a lower-priority "global" bindings via in_scope
:
>>> env = Environment(globalvar = "global variable", sharedvar = "global shared variable")
>>> obj = String("hello").bind(localvar = "local variable", sharedvar = "local shared variable")
>>> obj.scopes()
(Environment({Ref(localvar): 'local variable', Ref(sharedvar): 'local shared variable'}),)
Now we can bind env
directly into obj
as if they were local variables using bind
:
>>> obj.bind(env).scopes()
(Environment({Ref(globalvar): 'global variable', Ref(sharedvar): 'global shared variable'}),
Environment({Ref(localvar): 'local variable', Ref(sharedvar): 'local shared variable'}))
Alternatively we can bind env
into obj
as if they were global variables using in_scope
:
>>> obj.in_scope(env).scopes()
(Environment({Ref(localvar): 'local variable', Ref(sharedvar): 'local shared variable'}),
Environment({Ref(globalvar): 'global variable', Ref(sharedvar): 'global shared variable'}))
You can see the local variables take precedence. The use of scoping will become more obvious when in the context of templating.
As briefly mentioned at the beginning, Mustache templates are first class
"language" features. Let's look at the simple case of a String
to see how
Mustache templates might behave.
>>> String('echo {{hello_message}}')
String(echo {{hello_message}})
OK, seems reasonable enough. Now let's look at the more complicated version
of a Float
:
>>> Float('not.floaty')
CoercionError: Cannot coerce 'not.floaty' to Float
But if we template it, it behaves differently:
>>> Float('{{not}}.{{floaty}}')
Float({{not}}.{{floaty}})
Pystachio understands that by introducing a Mustache template, that we
should lazily coerce the Float
only once it's fully specified by its
environment. For example:
>>> not_floaty = Float('{{not}}.{{floaty}}')
>>> not_floaty.bind({'not': 1})
Float(1.{{floaty}})
We've bound a variable into the environment of not_floaty
. It's still not
floaty:
>>> not_floaty.bind({'not': 1}).check()
TypeCheck(FAILED): u'1.{{floaty}}' not a float
However, once it becomes fully specified, the picture changes:
>>> floaty = not_floaty.bind({'not': 1, 'floaty': 0})
>>> floaty
Float(1.0)
>>> floaty.check()
TypeCheck(OK)
Of course, the coercion can only take place if the environment is legit:
>>> not_floaty.bind({'not': 1, 'floaty': 'GARBAGE'})
CoercionError: Cannot coerce '1.GARBAGE' to Float
It's worth noting that floaty
has not been coerced permanently:
>>> floaty
Float(1.0)
>>> floaty.bind({'not': 2})
Float(2.0)
In fact, floaty
continues to store the template; it's just hidden from
view and interpolated on-demand:
>>> floaty._value
'{{not}}.{{floaty}}'
As we mentioned before, objects have scopes. Let's look at the case of floaty:
>>> floaty = not_floaty.bind({'not': 1, 'floaty': 0})
>>> floaty
Float(1.0)
>>> floaty.scopes()
(Environment({Ref(not): '1', Ref(floaty): '0'}),)
But if we bind not = 2
:
>>> floaty.bind({'not': 2})
Float(2.0)
>>> floaty.bind({'not': 2}).scopes()
(Environment({Ref(not): '2'}), Environment({Ref(floaty): '0', Ref(not): '1'}))
If we had merely just evaluated floaty in the scope of not = 2
, it would have behaved differently:
>>> floaty.in_scope({'not': 2})
Float(1.0)
The interpolation of template variables happens in scope order from top
down. Ultimately bind
just prepends a scope to the list of scopes and
in_scope
appends a scope to the end of the list of scopes.
Remember however that you can bind any Namable
object, which includes
List
, Map
, Struct
and Environment
types, and these are hierarchical.
Take for example a schema that defines a UNIX process:
class Process(Struct):
name = Default(String, '{{config.name}}')
cmdline = String
class ProcessConfig(Struct):
name = String
ports = Map(String, Integer)
The expectation could be that Process
structures are always interpolated
in an environment where config
is set to the ProcessConfig
.
For example:
>>> webserver = Process(cmdline = "bin/tfe --listen={{config.ports[http]}} --health={{config.ports[health]}}")
>>> webserver
Process(cmdline=bin/tfe --listen={{config.ports[http]}} --health={{config.ports[health]}}, name={{config.name}})
Now let's define its configuration:
>>> app_config = ProcessConfig(name = "tfe", ports = {'http': 80, 'health': 8888})
>>> app_config
ProcessConfig(name=tfe, ports=StringIntegerMap(health => 8888, http => 80))
And let's evaluate the configuration:
>>> webserver % Environment(config = app_config)
Process(cmdline=bin/tfe --listen=80 --health=8888, name=tfe)
The %
-based interpolation is just shorthand for in_scope
.
List
types and Map
types are dereferenced as expected in the context of
{{}}
-style mustache templates, using [index]
for List
types and
[value]
for Map
types. Struct
types are dereferenced using .
-notation.
For example, {{foo.bar[23][baz].bang}}
translates to a name lookup chain
of foo (Struct) => bar (List or Map) => 23 (Map) => baz (Struct) => bang
,
ensuring the type consistency at each level of the lookup chain.
The use of templating is most powerful in the use of Struct
types where
parent object scope is inherited by all children during interpolation.
Let's look at the example of building a phone book type.
class PhoneBookEntry(Struct):
name = Required(String)
number = Required(Integer)
class PhoneBook(Struct):
city = Required(String)
people = List(PhoneBookEntry)
>>> sf = PhoneBook(city = "San Francisco").bind(areacode = 415)
>>> sj = PhoneBook(city = "San Jose").bind(areacode = 408)
We met a girl last night in a bar, her name was Jenny, and her number was 8 6 7 5 3 oh nayee-aye-in. But in the bay area, you never know what her area code could be, so we template it:
>>> jenny = PhoneBookEntry(name = "Jenny", number = "{{areacode}}8675309")
But brian
is a Nebraskan farm boy from the 402 and took his number with him:
>>> brian = PhoneBookEntry(name = "Brian", number = "{{areacode}}5551234")
>>> brian = brian.bind(areacode = 402)
If we assume that Jenny is from San Francisco, then we look her up in the San Francisco phone book:
>>> sf(people = [jenny])
PhoneBook(city=San Francisco, people=PhoneBookEntryList(PhoneBookEntry(name=Jenny, number=4158675309)))
But it's equally likely that she could be from San Jose:
>>> sj(people = [jenny])
PhoneBook(city=San Jose, people=PhoneBookEntryList(PhoneBookEntry(name=Jenny, number=4088675309)))
If we bind jenny
to one of the phone books, she inherits the area code
from her parent object. Of course, brian
is from Nebraska and he kept his
number, so San Jose or San Francisco, his number remains the same:
>>> sf(people = [jenny, brian])
PhoneBook(city=San Francisco,
people=PhoneBookEntryList(PhoneBookEntry(name=Jenny, number=4158675309),
PhoneBookEntry(name=Brian, number=4025551234)))
Because of how Struct
based schemas are created, the constructor of
such a schema behaves like a deserialization mechanism from a straight
Python dictionary. In a sense, deserialization comes for free. Take the
schema defined below:
class Resources(Struct):
cpu = Required(Float)
ram = Required(Integer)
disk = Default(Integer, 2 * 2**30)
class Process(Struct):
name = Required(String)
resources = Required(Resources)
cmdline = String
max_failures = Default(Integer, 1)
class Task(Struct):
name = Required(String)
processes = Required(List(Process))
max_failures = Default(Integer, 1)
Let's write out a task as a dictionary, as we would expect to see from the schema:
task = {
'name': 'basic',
'processes': [
{
'resources': {
'cpu': 1.0,
'ram': 100
},
'cmdline': 'echo hello world'
}
]
}
And instantiate it as a Task (indentation provided for clarity):
>>> tsk = Task(task)
>>> tsk
Task(processes=ProcessList(Process(cmdline=echo hello world, max_failures=1,
resources=Resources(disk=2147483648, ram=100, cpu=1.0))),
max_failures=1, name=basic)
The schema that we defined as a Python class structure is applied to the dictionary. We can use this schema to type-check the dictionary:
>>> tsk.check()
TypeCheck(FAILED): Task[processes] failed: Element in ProcessList failed check: Process[name] is required.
It turns out that we forgot to specify the name of the Process
in our
process list, and it was a Required
field. If we update the dictionary to
specify 'name', it will type check successfully.
It is possible to serialize constructed types, pickle them and send them around with your dictionary data in order to do portable type checking.
Every type in Pystachio has a serialize_type
method which is used to
describe the type in a portable way. The basic types are uninteresting:
>>> String.serialize_type()
('String',)
>>> Integer.serialize_type()
('Integer',)
>>> Float.serialize_type()
('Float',)
The notation is simply: String
types are produced by the "String"
type
factory. They are not parameterized types so they need no additional type
parameters. However, Lists and Maps are parameterized:
>>> List(String).serialize_type()
('List', ('String',))
>>> Map(Integer,String).serialize_type()
('Map', ('Integer',), ('String',))
>>> Map(Integer,List(String)).serialize_type()
('Map', ('Integer',), ('List', ('String',)))
Furthermore, composite types created with Struct
are also serializable.
Take the composite types defined in the previous section: Task
, Process
and Resources
.
>>> from pprint import pprint
>>> pprint(Resources.serialize_type(), indent=2, width=100)
( 'Struct',
'Resources',
('cpu', (True, (), True, ('Float',))),
('disk', (False, 2147483648, False, ('Integer',))),
('ram', (True, (), True, ('Integer',))))
In other words, the Struct
factory is producing a type with a set of type
parameters: Resources
is the name of the struct, cpu
, disk
and ram
are attributes of the type.
If you serialize Task
, it recursively serializes its children types:
>>> pprint(Task.serialize_type(), indent=2, width=100)
( 'Struct',
'Task',
('max_failures', (False, 1, False, ('Integer',))),
('name', (True, (), True, ('String',))),
( 'processes',
( True,
(),
True,
( 'List',
( 'Struct',
'Process',
('cmdline', (False, (), True, ('String',))),
('max_failures', (False, 1, False, ('Integer',))),
('name', (True, (), True, ('String',))),
( 'resources',
( True,
(),
True,
( 'Struct',
'Resources',
('cpu', (True, (), True, ('Float',))),
('disk', (False, 2147483648, False, ('Integer',))),
('ram', (True, (), True, ('Integer',)))))))))))
Given a type tuple produced by serialize_type, you can then use
TypeFactory.load
from pystachio.typing
to load a type into an interpreter. For example:
>>> pprint(TypeFactory.load(Resources.serialize_type()))
{'Float': <class 'pystachio.basic.Float'>,
'Integer': <class 'pystachio.basic.Integer'>,
'Resources': <class 'pystachio.typing.Resources'>}
TypeFactory.load
returns a map from type name to the fully reified type for all types required to
describe the serialized type, including children. In the example of Task
above:
>>> pprint(TypeFactory.load(Task.serialize_type()))
{'Float': <class 'pystachio.basic.Float'>,
'Integer': <class 'pystachio.basic.Integer'>,
'Process': <class 'pystachio.typing.Process'>,
'ProcessList': <class 'pystachio.typing.ProcessList'>,
'Resources': <class 'pystachio.typing.Resources'>,
'String': <class 'pystachio.basic.String'>,
'Task': <class 'pystachio.typing.Task'>}
TypeFactory.load
also takes an into
keyword argument, so you can do
TypeFactory.load(type, into=globals())
in order to deposit them into your interpreter:
>>> from pystachio import *
>>> TypeFactory.load(( 'Struct',
... 'Task',
... ('max_failures', (False, 1, False, ('Integer',))),
... ('name', (True, (), True, ('String',))),
... ( 'processes',
... ( True,
... (),
... True,
... ( 'List',
... ( 'Struct',
... 'Process',
... ('cmdline', (False, (), True, ('String',))),
... ('max_failures', (False, 1, False, ('Integer',))),
... ('name', (True, (), True, ('String',))),
... ( 'resources',
... ( True,
... (),
... True,
... ( 'Struct',
... 'Resources',
... ('cpu', (True, (), True, ('Float',))),
... ('disk', (False, 2147483648, False, ('Integer',))),
... ('ram', (True, (), True, ('Integer',))))))))))), into=globals())
>>> Task
<class 'pystachio.typing.Task'>
>>> Process
<class 'pystachio.typing.Process'>
>>> Task().check()
TypeCheck(FAILED): Task[processes] is required.
>>> Resources().check()
TypeCheck(FAILED): Resources[ram] is required.
>>> Resources(cpu = 1.0, ram = 1024, disk = 1024).check()
TypeCheck(OK)
Types produced by TypeFactory.load
are reified types but they are not
identical to each other. This could be provided in the future via type
memoization but that would require keeping some amount of state around.
Instead, __instancecheck__
has been provided, so that you can do
isinstance
checks:
>>> Task
<class 'pystachio.typing.Task'>
>>> Task == TypeFactory.new({}, *Task.serialize_type())
False
>>> isinstance(Task(), TypeFactory.new({}, *Task.serialize_type()))
True
@wickman (Brian Wickman)
Thanks to @marius for some of the original design ideas, @benh, @jsirois, @wfarner and others for constructive comments.