/analytics

GoodToCode Analytics supports file infrastructure (Excel), AI services (Azure Cognitive Services, Text Analytics) and persistence (Azure Storage Tables, CosmosDb) for Data Lake analytics workflows.

Primary LanguageC#MIT LicenseMIT

GoodToCode Analytics Library for Azure Cognitive Services

Build Status

GoodToCode Analytics supports file infrastructure (Excel), AI services (Azure Cognitive Services, Text Analytics) and persistence (Azure Storage Tables, CosmosDb) for Data Lake analytics workflows.

This is a simple, low-dependency library for managing Azure Cognitive Services and Text Analytics, and persisting the results to Azure Storage Tables and CosmosDb. These services rely on Azure Machine Learning and Artificial Intelligence in the Azure Cognitive Services suite. The APIs supported are text analytics and cognitive services, expanding to others such as computer vision, facial recognition, video indexing, etc.

/src Contents

Path Item Contents
src - Contains the C# solution, project files and source code.
src Analytics.Activities Workflow activities to be the steps of an Durable Function Orchestration
src Analytics.Domain Domain Entities for this solutions services.
src Analytics.Tests Tests against fakes and reals for cognitive services and text analytics.

/infrastructure ARM Templates

Path | Contents --- | --- | --- infrastructure | - | Contains Azure DevOps YML files, Windows PowerShell scripts, and variables to support Azure DevOps YML Pipelines. infrastructure | *.json | ARM template for that Azure resource. infrastructure | *.parameters.json | Parameter definition for the ARM template for that Azure resource.

/pipeline YML Files

Path Item Contents
pipelines - Contains Azure DevOps YML files, Windows PowerShell scripts, and variables to support Azure DevOps YML Pipelines.
pipelines gtc-rg-analytics-src.yml Azure DevOps Pipeline main file.
pipelines scripts Command Line Interface files (.cmd) for windows/bash commands. Windows PowerShell scripts Set-Version.ps1.
pipelines steps Azure DevOps Pipeline step templates.
pipelines variables Variables (non-secret only) for the Azure landing zone, Azure infrastructure and NuGet packages.

Azure Cognitive Services

Cognitive Service Purpose
Computer Vision Inspects each image associated with an incoming article to (1) scrape out written words from the image and (2) determine what types of objects are present in the image.
Face API Inspects each image associated with an incoming article to find faces and determine whether the face represents a male or female and associates an estimated age to those faces.
Text Analytics Used to find key word phrases and entities in title and body text after it has been translated.
Translation API Determines the language of the incoming title and body, when present, then translates them to English. However, the target language is just another input and can be changed from English to any supported language of your choice.

Azure Services used in GoodToCode repositories

Azure Service Purpose
Azure Cosmos DB NoSQL database where original content as well as processing results are stored.
Azure Functions Code blocks that analyze the documents stored in the Azure Cosmos DB.
Azure Service Bus Service bus queues are used as triggers for durable Azure Functions.
Azure Storage Holds images from articles and hosts the code for the Azure Functions.

Note This design uses the service collection extensions, dependency inversion, queue notification, and serverless patterns for simplicity. While these are useful patterns, this is not the only pattern that can be used to accomplish this data flow.

Azure Service Bus Topics could be used which would allow processing different parts of the article in a parallel as opposed to the serial processing done in this example. Topics would be useful if article inspection processing time is critical. A comparison between Azure Service Bus Queues and Azure Service Bus Topics can be found here.

Azure functions could also be implemented in an Azure Logic App. However, with parallel processing the user would have to implement record-level locking such as Redlock until Cosmos DB supports partial document updates.

A comparison between durable functions and Logic apps can be found here.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.