A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner.
To know about latest improvements, please check changelog.
$ pip install -U pandas # upgrade pandas
$ pip install swifter # first time installation
$ pip install -U swifter # upgrade to latest version if already installed
alternatively, to install on Anaconda:
conda install -c conda-forge swifter
...after installing, import swifter
into your code along with pandas
using:
import pandas as pd
import swifter
df = pd.DataFrame({'x': [1, 2, 3, 4], 'y': [5, 6, 7, 8]})
# runs on single core
df['x2'] = df['x'].apply(lambda x: x**2)
# runs on multiple cores
df['x2'] = df['x'].swifter.apply(lambda x: x**2)
# use swifter apply on whole dataframe
df['agg'] = df.swifter.apply(lambda x: x.sum() - x.min())
# use swifter apply on specific columns
df['outCol'] = df[['inCol1', 'inCol2']].swifter.apply(my_func)
df['outCol'] = df[['inCol1', 'inCol2', 'inCol3']].swifter.apply(my_func,
positional_arg, keyword_arg=keyword_argval)
Further documentations on swifter is available here.
Check out the examples notebook, along with the speed benchmark notebook
When vectorization is not possible, automatically decides which is faster: to use dask parallel processing or a simple pandas apply
-
The function is documented in the .py file. In Jupyter Notebooks, you can see the docs by pressing Shift+Tab(x3). Also, check out the complete documentation here along with the changelog.
-
Please upgrade your version of pandas, as the pandas extension api used in this module is a recent addition to pandas.
-
It is advised to disable the progress bar if calling swifter from a forked process as the progress bar may get confused between various multiprocessing modules.