/Confluence-AzureLanguageStudio-Sample

Sample code to ingest knowledge base on Confluence to Azure Language Studio - Custom Question Answering (QnAMaker).

Primary LanguagePython

Confluence - Azure Language Studio Sample

This is a demostration on extracting content from Confluence Server, and send the information to Azure Language Studio to overlay AI - Custom Question Answering.

Content

This repository contains 2 folders. As the name suggested, One-Time Ingestion is for initial ingestion, whereas second folder, Azure Functions is a sample to use Azure Functions to trigger adhoc updates.

One Time Ingestion

The logic is simple. First, we use Confluence Server API to get all content ID needed, then we will retrieve content of individual content id. Note that the body is in HTML markup format, with custom tag (for example, images), so we need to handle those tag by replacing with the right html tag.

Attachments such as images are returned as well, but it will only return attachment name, instead of the full content. Likewise, we will need to use Confluence API to retrieve the attachment data, store them in Azure Storage, generate a SAS URL for access purpose, and use this URL as src of HTML img tag.

Custom Question Answering supports markdown as response, hence we then convert the HTML into markdown. Lastly, the title will be used as question, and the body content is used as response, and we leverage on Azure Language Studio API to perform update.

Azure Functions

The logic is similar, but instead of crawling Confluence Server, Azure Functions take in request body:

{
    "id": "123456",
    "title" : "Random text"
}

The input can be generated from either custom script, or query Confluence Server database (NOT RECOMMENDED). Sample query script is available here.

The automation can be triggered by Timer, using Azure Logic Apps. Here's a sample flow, using SQL command above.

image

Considerations

  1. The sample above is for demostration purposes, and is not meant for production use yet.
  2. Strongly encourage to leverage Confluence provided API instead of querying database directly.
  3. This sample is for Confluence Server. Similar concept can be applied on Confluence Cloud.
  4. Factor in consideration on complex responses, as Custom Question Answering supports markdown only. Content like videos etc shall be reference to external URL.
  5. The sample is based on simple content update routine. Do create your own workflow to factor in content creation process, like versioning, updates etc.