plydata
Latest Release | |
License | |
Build Status | |
Coverage | |
Documentation (Dev) | |
Documentation (Release) |
plydata is a library that provides a grammar for data manipulation. The grammar consists of verbs that can be applied to pandas dataframes or database tables. It is based on the R package dplyr. plydata uses the >> operator as a pipe symbol.
At present the only supported data store is the pandas dataframe. We expect to support sqlite and maybe postgresql and mysql.
Installation
plydata only supports Python 3.
Official version
$ pip install plydata
Development version
$ pip install git+https://github.com/has2k1/plydata.git@master
Example
import pandas as pd
import numpy as np
from plydata import define, query, modify_where
df = pd.DataFrame({
'x': [0, 1, 2, 3],
'y': ['zero', 'one', 'two', 'three']})
df >> define(z='x')
"""
x y z
0 0 zero 0
1 1 one 1
2 2 two 2
3 3 three 3
"""
df >> define(z=0) >> modify_where('x > 1', z=1)
"""
x y z
0 0 zero 0
1 1 one 0
2 2 two 1
3 3 three 1
"""
# You can pass the dataframe as the # first argument
query(df, 'x > 1') # same as `df >> query('x > 2')`
"""
x y
2 2 two
3 3 three
"""
plydata piping works with plotnine.
from plotnine import ggplot, aes, geom_line
df = pd.DataFrame({'x': np.linspace(0, 2*np.pi, 100)})
df >> define(y='np.sin(x)') >> ggplot(aes('x', 'y')) + geom_line()
What about dplython or pandas-ply?
dplython and pandas-ply are two other packages that have a similar objective to plydata. The big difference is plydata does not use a placeholder variable (X) as a stand-in for the dataframe. For example:
diamonds >> select(X.carat, X.cut, X.price) # dplython
diamonds >> select('carat', 'cut', 'price') # plydata
select(diamonds, 'carat', 'cut', 'price') # plydata
For more, see the documentation.