/azure-search-video-knowledge-mining

Azure Cognitive Search - Video Knowledge Mining Extension

Primary LanguagePython

page_type languages products description urlFragment
sample
python
azure
Video Knowledge Mining Solution
azure-search-video-knowledge-mining

Azure Cognitive Search - Video Knowledge Mining Extension

architecture

Demo site

Video Knowledge Mining Demo

Extend Azure Cognitive Search

Extend Azure Cognitive Search capabilities enabling video transcripts and insights search, through an integration with Azure Video Analyzer for Media (formerly Azure Video Indexer).

Repo Architecture

This repo is a collection of two Azure Functions:

  • start-video-indexing (trigger a video indexing starting from a video upload in Azure Blob Storage)
    architecture-start-video-indexing

  • video-indexer-callback (callback from Azure Blob Storage and push data to Azure Cognitive Search and Azure Blob Storage) architecture-video-indexer-callback

and required infrastructure to set up the full end-to-end solution.

One click Azure Deployment

To deploy the full solution and the Web Application, select the following button:

Deploy to Azure

The Azure portal displays a pane that allows you to easily provide parameter values. The parameters are pre-filled with the default values from the template.

Once the deployment is completed, start processing your videos dropping them in the "video-knowledge-mining-drop" container in the "storage account" in your "Resource Group". To navigate the Web Ui, check the "App Service" resource in your "Resource Group".

Available insights in Azure Video Analytics for Media

Video insights

  • Face detection: Detects and groups faces appearing in the video.
  • Celebrity identification: Video Indexer automatically identifies over 1 million celebrities—like world leaders, actors, actresses, athletes, researchers, business, and tech leaders across the globe. The data about these celebrities can also be found on various websites (IMDB, Wikipedia, and so on).
  • Account-based face identification: Video Indexer trains a model for a specific account. It then recognizes faces in the video based on the trained model. For more information, see Customize a Person model from the Video Indexer website and Customize a Person model with the Video Indexer API.
  • Thumbnail extraction for faces ("best face"): Automatically identifies the best captured face in each group of faces (based on quality, size, and frontal position) and extracts it as an image asset.
  • Visual text recognition (OCR): Extracts text that's visually displayed in the video.
  • Visual content moderation: Detects adult and/or racy visuals.
  • Labels identification: Identifies visual objects and actions displayed.
  • Keyframe extraction: Detects stable keyframes in a video.

Audio insights

  • Automatic language detection: Automatically identifies the dominant spoken language. Supported languages include English, Spanish, French, German, Italian, Chinese (Simplified), Japanese, Russian, and Brazilian Portuguese. If the language can't be identified with confidence, Video Indexer assumes the spoken language is English. For more information, see Language identification model.
  • Multi-language speech identification and transcription (preview): Automatically identifies the spoken language in different segments from audio. It sends each segment of the media file to be transcribed and then combines the transcription back to one unified transcription. For more information, see Automatically identify and transcribe multi-language content.
  • Audio transcription: Converts speech to text in 12 languages and allows extensions. Supported languages include English, Spanish, French, German, Italian, Chinese (Simplified), Japanese, Arabic, Russian, Brazilian Portuguese, Hindi, and Korean.
  • Closed captioning: Creates closed captioning in three formats: VTT, TTML, SRT.
  • Two channel processing: Auto detects separate transcript and merges to single timeline.
  • Noise reduction: Clears up telephony audio or noisy recordings (based on Skype filters).
  • Transcript customization (CRIS): Trains custom speech to text models to create industry-specific transcripts. For more information, see Customize a Language model from the Video Indexer website and Customize a Language model with the Video Indexer APIs.
  • Speaker enumeration: Maps and understands which speaker spoke which words and when.
  • Speaker statistics: Provides statistics for speakers' speech ratios.
  • Textual content moderation: Detects explicit text in the audio transcript.
  • Audio effects: Identifies audio effects like hand claps, speech, and silence.
  • Emotion detection: Identifies emotions based on speech (what's being said) and voice tonality (how it's being said). The emotion could be joy, sadness, anger, or fear.
  • Translation: Creates translations of the audio transcript to 54 different languages.

Audio and video insights (multi-channels)

When indexing by one channel, partial result for those models will be available.

  • Keywords extraction: Extracts keywords from speech and visual text.
  • Named entities extraction: Extracts brands, locations, and people from speech and visual text via natural language processing (NLP).
  • Topic inference: Makes inference of main topics from transcripts. The 2nd-level IPTC taxonomy is included.
  • Sentiment analysis: Identifies positive, negative, and neutral sentiments from speech and visual text.

OPTIONAL - Web App

NOTE: Web App deployment is automated by the One click Azure Deployment. Use the following instructions if you want to deploy just the UI and not the full solution.

To deploy a video indexer enabled Knowledge Mining Solution Accelerator Web App, you can pull and run a pre-built docker image providing a .env configuration file :

    docker run -d --env-file .env -p 80:80 videokm.azurecr.io/ui:latest

If you want to personalize the UI, please refer to this Knowledge Mining Solution Accelerator with Video Indexer

How to create a .env file

Modify the .env file with your application settings:

Required fields

SearchServiceName=
SearchApiKey=
SearchIndexName=
StorageAccountName=
StorageAccountKey=
StorageContainerAddress=https://{storage-account-name}.blob.core.windows.net/{container-name}
KeyField=metadata_storage_path
IsPathBase64Encoded=true
SearchIndexNameVideoIndexerTimeRef=videoinsights-time-references
AVAM_Account_Id=
AVAM_Api_Key=
AVAM_Account_Location=
  • SearchServiceName - The name of your Azure Cognitive Search service
  • SearchApiKey - The API Key for your Azure Cognitive Search service
  • SearchIndexName - The name of your Azure Cognitive Search index
  • SearchIndexerName - The name of your Azure Cognitive Search indexer
  • StorageAccountName - The name of your Azure Blob Storage Account
  • StorageAccountKey - The key for your Azure Blob Storage Account
  • StorageContainerAddress - The URL to the storage container where your - documents are stored. This should be in the following format: https://- storageaccountname.blob.core.windows.net/containername
  • KeyField - They key field for your search index. This should be set to the - field specified as a key document Id in the index. By default this is - metadata_storage_path.
  • IsPathBase64Encoded - By default, metadata_storage_path is the key, and it - gets base64 encoded so this is set to true by default. If your key is not - encoded, set this to false.
  • SearchIndexNameVideoIndexerTimeRef - The name of your Azure Cognitive - Search time entries index - Leave as the default value if you did not change - it in the infrastructure creation scripts
  • AVAM_Account_Id - The ID of your Azure Video Analyzer for Media
  • AVAM_Api_Key - The API Key of your Azure Video Analyzer for Media
  • AVAM_Account_Location - The Location of your Azure Video Analyzer for Media

Optional Fields

While some fields are optional, we recommend not removing them from .env to avoid any possible errors.

InstrumentationKey=
StorageContainerAddress2=https://{storage-account-name}.blob.core.windows.net/{container-name}
StorageContainerAddress3=https://{storage-account-name}.blob.core.windows.net/{container-name}
AzureMapsSubscriptionKey=
GraphFacet=keyPhrases, locations
SearchIndexNameVideoIndexerTimeRef=videoinsights-time-references
Customizable=true
OrganizationName=Microsoft
OrganizationLogo=~/images/logo.png
OrganizationWebSiteUrl=https://www.microsoft.com
  • InstrumentationKey - Optional instumentation key for Application Insights. - The instrumentation key connects the web app to Application Inisghts in order - to populate the Power BI reports.
  • StorageContainerAddress2 & StorageContainerAddress3 - Optional - container addresses if using more than one indexer
  • AzureMapsSubscriptionKey - You have the option to provide an Azure Maps - account if you would like to display a geographic point in a map in the - document details. The code expects a field called geolocation of type Edm.- GeographyPoint.
  • GraphFacet - The GraphFacet is used for generating the relationship graph. - This can now be edited in the UI.
  • Customizable - Determines if user is allowed to customize the web app. - Customizations include uploading documents and changing the colors/logo of the - web app. OrganizationName, OrganizationLogo, and - OrganizationWebSiteUrl are additional fields that also allow you to do light customization.