A Python DSL for schema transformations
Docs: http://percolate.github.io/bfh/
Look, you can put square pegs in round holes:
class SquarePeg(Schema):
id = IntegerField()
name = UnicodeField()
width = NumberField()
class RoundHole(Schema):
id = UnicodeField()
name = UnicodeField()
diameter = NumberField()
def largest_square(width):
return math.sqrt(2 * width**2)
class SquarePegToRoundHole(Mapping):
source_schema = SquarePeg
target_schema = RoundHole
id = Concat('from_square', ':', Str(Get('id')))
name = Get('name')
diameter = Do(largest_square, Get('width'))
my_peg = SquarePeg(id=1, name="peggy", width=50)
transformed = SquarePegToRoundHole().apply(my_peg).serialize()
transformed['id']
# u'from_square:1'
transformed['name']
# u'peggy'
transformed['diameter']
# 70.71067811865476
BFH is a DSL for mapping blobs to other blobs. It can map dict-ish or object-ish things... as long as you got names and the names got values, you can whack it into shape. The use of explicit schema objects is totally optional... a mapping can be used without input and output schemas... the names in the mapping are all you really need. Viz.
class ImpliesSchemas(Mapping):
id = Concat('author', ':', Get('nom_de_plume'))
name = Get('nom_de_plume')
book = Get('best_known_for')
source = {
"nom_de_plume": "Mark Twain",
"best_known_for": "Huckleberry Finn"
}
output = ImpliesSchemas().apply(source)
type(output)
# <class 'bfh.GenericSchema'>
output.serialize().keys()
# ['book', 'id', 'name']
Explicit schemas, however, can help preserve your sanity when things get complex.
While BFH can validate a schema, it's not primarily a validation library, OK? There are lots of those out there, so we don't get too fancy here. Just some sanity checking.
my_peg = SquarePeg(id=1, name="peggy", width=50)
my_peg.validate()
# True
broken_peg = SquarePeg(id=2, name="broken", width="a hundred")
broken_peg.validate()
# ... (raises Invalid)
Obviously your schema needs a field called 'if' or 'finally'. Use a double-underscore name and all will be well:
class Fancy(Schema):
# if = IntegerField() # Ouch! SyntaxError!
__if = IntegerField()
You can init in any of these ways:
Fancy(__if=1)
Fancy(**{"__if": 1})
Fancy(**{"if": 1})
In a mapping, use the de-dundered name
Get("if")
Given a sample dataset:
{
"name": "peg",
"age": null
}
If you call serialize with the default kwargs, you would get the following behavior:
transformed = SomeMapping().apply(parsed_data).serialize()
>>> {'name': 'peg', 'age': None}
But if implicit_nulls
is set to True, you would get:
transformed = SomeMapping().apply(parsed_data).serialize(implicit_nulls=True)
>>> {'name': 'peg'}
This can be useful to filter out null values from a dataset on serialization.
Tested on Python 2.7.10 and 3.5.0: