Azure Blob Storage Backend for Dask
- Supports dask when your data files are stored in the cloud.
- Import
DaskAzureBlobFileSystem
- Use
abfs://
as protocol prefix and you are good to do.
- Import
- For authentication, please read more on Usage.
- Support for key-value storage which is backed by azure storage. Create an instance of
AzureBlobMap
Make the right imports:
from azureblobfs.dask import DaskAzureBlobFileSystem import dask.dataframe as dd
then put all data files in an azure storage container say clippy
, then you can read it:
data = dd.read_csv("abfs://noaa/clippy/weather*.csv") max_by_state = data.groupby("states").max().compute()
you would need to set your azure account name in environment variable AZURE_BLOB_ACCOUNT_NAME
(which in our above example is noaa
) and the account key in AZURE_BLOB_ACCOUNT_KEY
.
If you don't want to use account key and instead want to use SAS, set it in the
environment variable AZURE_BLOB_SAS_TOKEN
along with the connection string in the
environment variable AZURE_BLOB_CONNECTION_STRING
.
Just:
pip install dask-azureblobfs
or get the development version if you love to live dangerously:
pip install git+https://github.com/manish/dask-azureblobfs@master#egg=dask-azureblobfs
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.