Painless-SQLAlchemy
Released on pypi. Install with
pip install painless-sqlalchemy
What is Painless-SQLAlchemy?
Painless-SQLAlchemy adds simplified querying and serialization to SQLAlchemy.
Supported Databases
Tests run Postgres 9.6.X.
Project could be adapted to work with MySQL.
Run Tests
Check out SETUP.md
Where can I get help?
Plese open a github issue.
Overview
Examples use Models described in conftest.py.
Filter()
Looking for all Teachers teaching a specific Student?
Teacher.filter({'students.id': student_id}).all()
How about all Students frequenting a specific Classroom...
Student.filter({'teachers.classroom.id': classroom_id}).all()
...who's first name is also Alex or John?
Student.filter({
'teachers.classroom.id': classroom_id,
'first_name': ["Alex", "John"]
}).all()
Ok, but what about all Students taught by a specific teacher or with a gmail address?
Student.filter(or_(
ref('teachers.id') == teacher_id,
ref('email').ilike("%@gmail.com")
)).all()
Serialize()
Let's get all teachers and their students:
Teacher.serialize(['name', 'students.name'])
returns
[
{
"name": "Nichole Copeland",
"students": [
{"name": "Margaret Anderson"},
{"name": "Laura Smith"},
...
]
},
...
]
And we can combine that with filtering:
Teacher.serialize(
['name', 'students.name'],
{"id": teacher_id}
)
Much More
Various other improvements and abstractions over SQLAlchemy are provided. E.g. CIText and PostGIS custom column types are provided for convenience.
Documentation
Model Definitions
All your Models need to inherit from Model.py
. Examples are given in conftest.py
Dot Notation
Simple
Dot notation is a simple and intuitive way to define relationship paths by
joining relationship names using dots. E.g. the teacher names of a student
can be referenced as teachers.name
.
Extended
Multiple fields can be referenced by using brackets. E.g. the name and id of
a students teachers can be referenced as teachers(id,name)
.
To reference the default serialization of a Model a *
can be used.
E.g. teacher default serialization can be references from student as as teachers.*
.
Filter()
Model.filter(...)
Returns SQLAlchemy query with necessary joins and passed filtering applied.
Info: Chainable using query
parameter. However to reduce amount
of automatic joins, using a single filter call is preferred.
Parameters
attributes
: dict or SQLAlchemy filter using ref
(optional)
query
: SQLAlchemy query used as base, can be pre-filtered (optional)
skip_nones
: Skip None-values when using dict (optional, default false
)
Dictionary Filtering
This method is preferred to clause filtering when possible, since it's easier to read and better optimized in certain cases compared to clause filtering.
The dictionary contains simple key-value lookups. Keys are expected in simple dot notation. The values can be lists or simple values.
None Values
None values are filtered from the attributes
dictionary if skip_nones
is used.
Convenient when None values should be ignored. Parameter can not be used for
clause filtering.
Relationships and Lists
Filter values can be lists when attribtues
is a dictionary.
In this case the behaviour depends on the type of relationship / column that is filtered:
- For a
to-many
relationship every value in the list has to be matched by at least one relationship target (and
). - For a
to-one
relationship or simple column only one element from the list has to be matched (or
).
Filtering by lists is optimized, hence list entries must be unique.
Clause Filtering
Instead of a dictionary, an SQLAlchemy clause can be passed as attribtues
.
Referenced column using the ref
constructor will be automatically joined
to the query.
Optimization
The clause is analysed and only joins that are necessary are being made. In most cases we can prevent redundant joints. Optimization information is stored on the query.
Serialize()
Model.serialize(...)
Returns JSON serializable representation of fetched data.
Parameters
to_return
: list of fields to return, allows extended dot notation (optional)
filter_by
: see identical parameter on filter()
(optional)
limit
: maximum amount of objects fetched (optional)
offset
: offset value for objects fetched (optional)
query
: see identical parameter on filter()
(optional)
skip_nones
: see identical parameter on filter()
(optional)
order_by
: result ordering, multiple via tuple, defaults to id ordering (optional)
session
: Explict session to use for query (optional)
expose_all
: Whether to Return not exposed fields (optional)
params
: Query parameters (optional)
Default Serialization
When no to_return
is passed, the default serialization for the model is used.
Can be customized per Model by overwriting default_serialization
(defaults to id
).
Default serialization can also be referenced using the *
or appending it as
an entry to the dot notation.
Using explicit serialization opposed to the default serialization should
be preferred. Also be careful when defining default serialization containing to-many
relationships or referencing
default serialization from other default serialization as the resulting queries
can quickly get very expensive.
However when testing and writing proof of concept the default serialization is very helpful.
Column Exposure
Not exposed columns are not returned from serialize()
unless expose_all
is
used. This is to reduce the risk of leaking secret information. For example
when storing a password hash it should be set to not exposed.
By default primary keys are not exposed and all other fields are exposed.
However exposure can be explicitly set by passing info={"exposed": True/False}
into the Column
constructor.
Eager Loading
When using serialize only necessary column are loaded. This feature makes
serialize()
much more efficient than manually loading and serializing models.
Advanced Columns
We can define custom mapping for data the does not directly correspond to a database column entry.
MapColumn
Custom serializations can be easily created using MapColumn
.
If we want to create a contact_info
serialization on a student we can write:
contact_info = MapColumn({
'phone': 'phone',
'home_phone': 'home_phone',
'email': 'email'
})
Note that to-one
relationships can also be referenced, but to-many
relationships are not supported. Filtering by MapColumn
is also not possible.
column_property
To serialize entries that don't come straight from database columns, we
can use column_properties. These are fully supported for filter()
and serialize()
.
However notice that filtering by computed fields can be very expensive.
Custom Column Types
CIText
Column type for CIText. Requires database extension citext
to be created.
email = Column(CIText(64, True))
Constructor takes two parameters, the maximum length of the citext and
a boolean flag enforce_lower
.
If the boolean flag is set to true, all input is forced to lower case in application layer (data already in the database is not changed). If the flag is set to false, the field behaves like an ordinary CIText and saves case as provided.
HexColor
Column type for Hex Color of format "#RRGGBB".
color = Column(HexColorType)
Will raise ValueError if invalid input is provided. Data consistency is not enforced in the database layer.
Uses Integer database representation to store provided value.
Time
Column type for 24-hour time of format "HH:MM". Valid range is "00:00" to "23:59".
opening = Column(TimeType)
Will raise ValueError if invalid input is provided. Data consistency is enforced, however database granularity finer than minute is not considered when loading.
Uses Time without timezone database representation to store provided value.
Time Zone
Column type for time zone. Valid values are all values found here that are also valid postgres timezones.
timezone = Column(TimeZoneType)
Will raise ValueError if invalid input is provided. Data consistency is not enforced. Unknown timezone will be loaded as "None".
Uses TEXT to store time zone in database.
PostGIS Types
Require database extension postgis
and package GeoAlchemy2. Floats are rounded to 9 digits in application layer logic to prevent rounding error induced bugs.
Assumes coordinates to be on earth. Consistency is partially enforced through the database. However SRID are expected to be correct.
Utility Functions:
haversine(lat1, lon1, lat2, lon2)
computes the distance on earth between gps coordinates[lat1, lon1]
and[lat2, lon2]
point_inside_polygon(x, y, poly)
returns true iff point defined byx,y
is inside non-overlapping polygonpoly
Location
Gps coordinate as tuple (latitude, longitude)
.
location = Column(LocationType)
Stored as Point geometry in the database.
Raises error in application layer logic if input is invalid.
Area
Gps area as list [(lat1, lon1), (lat2, lon2), ..., (latX, lonX), (lat1, lon1)]
location = Column(AreaType(True))
Stores as Polygon geometry in the database.
Takes boolean clockwise
argument. If set to True
this will enforced polygons to be clock-wise in application layer logic. If set to False
it will enforce counter clockwise and if set to None
polygons are stored as given.
Raises error in application layer logic if input is invalid. Open Polygons, Polygons with identical, consecutive points and Polygons with too few unique points are considered invalid.
Advanced Functions
Expand()
Model.expand(...)
Expand string representation of fields. E.g.
Student.expand('teachers.classroom(id,school_id),id')
would return ['teachers.classroom.id', 'teachers.classroom.school_id', 'id']
.
A path ending in *
gets expanded to the default serialization.
Has()
Model.has(...)
Check if a specific field exists for a model. E.g.
Student.has('teachers.classroom.id')
would return True
. Does not allow input that still needs to be expanded.
Utility Functionality
Many to Many Relationships
A function to conveniently generate many to many relationship tables is exposed in TableUtil.many_to_many
.
Testing Data Models
Generic functionality to write database tests is exposed through util.testing.*
. For an example
please consider the tests written for this project.
Internals and Optimization
Load as Required
When using serialize only necessary fields are queried. There are multiple steps here used to accomplish this. First determine required columns:
- Expand columns, make them unique, filter not exposed, handle MapColumns
- Gather columns required for relationship joins
Then apply filtering separately:
- load only required relationships
- load only required columns
- load only required column_property
Note that primary columns are automatically loaded by SQLAlchemy.
Join as Required
When using filter()
only necessary joins are made. So if we are filtering
by school.id
and school.name
, we only need to join school
once.
While the above case is trivial, this gets very interesting when nested boolean clauses are used. We can not simply remove all redundancy.
For example when a to-many
relationship is used in an and
cause, we can't
re-use the relationship since all conditions would then have to be met
on the same target, which is not desirable in most cases.
Limit and Offset
Limit and offset functionality is not trivial since many rows are returned from SQLAlchemy and it is not clear where one model ends and one model begins.
Currently window function and nested querying are used. Depending on database version this can be inefficient for large tables.