Hi. I changed the name of this library to engarde
. Check it out here.
A python package for defensive data analysis. (Name to be determined.)
- pandas
Supports python 2.7+ and 3.4+
Data are messy. You want to assert that certain invariants about your data across operations or updates to the raw data. This is a lightweight way of placing some additional structure on semi-structured data sources like CSVs.
There are two main ways of using the library. First, as decorators:
from dsadd.decorators import none_missing, unique_index, is_shape
@none_missing
def f(df1, df2):
return df1.add(df2)
@is_shape((1290, 10))
@unique_index
def make_design_matrix('data.csv'):
out = ...
return out
Second, interactively (probably with the pipe
method,
which requires pandas>=0.16.2).
>>> import dsadd.checks as dc
>>> (df1.reindex_like(df2))
... .pipe(dc.unique_index)
... .cumsum()
... .pipe(dc.within_range(0, 100))
... )
Functions take a DataFrame (and optionally arguments) and return a DataFrame.
If used as a decorator, the result for the decorated function is checked.
Any failed check raises with an AssertionError
.
- better NaN ignoring (e.g. is_monotonic)
- better subsetting / column-specific things
- better error messages