This repo provides an example on how to index multiple Azure Blob Storage containers in Azure Cognitive Search by using a single Azure Table Storage Indexer and dumping blob metadata in an Azure Table Storage with Azure Functions.
Logical components:
- Generate SAS: Azure Functions to generate a SAS token from an Azure Blob URL. It returns a file reference accordingly to Document Extraction skill format
- Document Cracking: Uses the Azure Cognitive Search Document Extraction skill to crack different supported document formats
- Key Phrase Extraction: A skill that uses the text extracted by the Document Cracking skill. You can add any other pre-built or custom skill here
Azure Cognitive Search assets:
The project provides two Azure Functions to copy blob metadata from Azure Blob Storage to Azure Table Storage in batch mode and event-based:
- BlobToTable - Function with EventGrid input to store Blob name and container name in an Azure Table Storage, using an event-based pattern
- ContainerToTableHttp - HTTP Function to call for copying all Blob metadata available in an Azure Blob Storage Container in an Azure Table Storage Use the batch mode for the initial ingestion and the event-based function to keep consistency between the updated blobs and the rows in the Azure Table Storage.
- Reacting to Blob storage events
- Use Azure Event Grid to route Blob storage events to web endpoint (Azure portal)
"AzureWebJobsStorage": # Storage account connection string for Azure Functions execution
"FUNCTIONS_WORKER_RUNTIME": "python",
"AzureBlobStorageConnectionString" : # Storage account connection string for blob metadata reading and SAS Token generation
"TableName": "droptable" # Table Storage name for dropping metadata from blob
"CopyMetadata": # set to "1" if you want to copy Blob Metadata in the event-based function (BlobToTable)