Azure Exporter for Scrapy

Scrapy feed export storage backend for Azure Storage.

Requirements

Python 3.8+

Installation

pip install git+https://github.com/scrapy-plugins/scrapy-feedexporter-azure-storage

Usage

Add this storage backend to the FEED_STORAGES Scrapy setting. For example:

# settings.py
FEED_STORAGES = {'azure': 'scrapy_azure_exporter.AzureFeedStorage'}

Configure authentication via any of the following settings:
- AZURE_CONNECTION_STRING
- AZURE_ACCOUNT_URL_WITH_SAS_TOKEN
- AZURE_ACCOUNT_URL & AZURE_ACCOUNT_KEY - If using this method, specify both of them.
For example,
```
AZURE_ACCOUNT_URL = "https://<your-storage-account-name>.blob.core.windows.net/"
AZURE_ACCOUNT_KEY = "Account key for the Azure account"
```

Configure in the FEEDS Scrapy setting the Azure URI where the feed needs to be exported.

FEEDS = {
    "azure://<account_name>.blob.core.windows.net/<container_name>/<file_name.extension>": {
        "format": "json"
    }
}

Write mode and blob type

The overwrite feed option is False by default when using this feed export storage backend. An extra feed option is also provided, blob_type, which can be "BlockBlob" (default) or "AppendBlob". See Understanding blob types. The feed options overwrite and blob_type can be combined to set the write mode of the feed export:

overwrite=False and blob_type="BlockBlob" create the blob if it does not exist, and fail if it exists.
overwrite=False and blob_type="AppendBlob" append to the blob if it exists and it is an AppendBlob, and create it otherwise.
overwrite=True overwrites the blob, even if it exists. The blob_type must match that of the target blob.

icarosadero/scrapy-feedexporter-azure-storage

Azure Exporter for Scrapy

Requirements

Installation

Usage

Write mode and blob type