Using Azure Cognitive and AI Services to Dub Video Translations

Deployment Steps:

Local Setup:

  1. Clone this repository to your local machine for local development.

  2. Follow TODO: this guide to set up your local machine to develop Logic Apps locally.

  3. Follow TODO: this guide to set up your local machine to develop Azure Functions with python. Azure Resources:

  4. Create an Azure app registration (a.k.a. service principal). This will be used to authenticate to various services.

    • the service principal needs access to these resources TODO
  5. Create a new resource group in Azure

  6. Deploy an Azure Storage Account w/ hierarchichal namespace enabled (a.k.a Azure Data Lake Storage Gen 2 or ADLS gen 2)

    • Locally redundant for pilot/POC work
    • HNS enabled
    • after deployment create a file system called "videodubbing"
  7. Deploy an Azure Logic App

    • Publish: Workflow
    • Plan type: Standard
    • App Service Plan: Create New
    • Plan Sku and Size: WS1
    • Create new storage account for workflow state and run history.
    • Enable application insights (create a new instance)
  8. Deploy Video Analyzer

    • Create a media services account
    • Create a media services storage account
    • Create a user-assignmed managed identity
    • Check "I have all the rights to use the content/file, and agree that it will be handled per the Online Services Terms and the Microsoft Privacy Statement."
    • After deployment, go to, sign in, and then click on Authorization. Create a new subscription name.
  9. Deploy Speech API

    • Defaults are okay.
  10. Deploy an Azure Key Vault

    • Pricing tier: Standard
    • Add a Access policy with all key permissions using the app registration you created previously.
    • After deployment, create the following secrets and apply their
    Secret Name Description Example Value
    TENANT-ID The Azure Active Directory tenant ID where you created the app registration and deployed your resources. "00000000-0000-0000-0000-000000000000"
    CLIENT-APP-ID Application ID of the service principal you created "00000000-0000-0000-0000-000000000000"
    CLIENT-APP-SECRET Secret key of the service prinicpal
    AVAM-ACCOUNT-ID Guid you can in the Azure portal or at "00000000-0000-0000-0000-000000000000"
    AVAM-ACCOUNT-REGION Azure region where AVAM was deployed i.e. eastus2, westus2, etc.
    AVAM-RESOURCE-ID Found in the Azure portal in the AVAM resource, under the "Properties" tab Ex. "/subscriptions/{your_subscription_id_guid}/
    AVAM-API-KEY Created at and accessed at
    SPEECH-LANGUAGES-CONFIG A json array of the various languages that you want to translate and dub the languages to. You can find supported languages for AVAM here and supported voices for Azure Speech API See below

Sample speech languages config:

    "language-text-code": "zh-Hans",
    "language-voice-code": "zh-CN",
    "language-voice-name": "zh-CN-XiaomoNeural",
    "language-display-name": "Chinese"
    "language-text-code": "es-MX",
    "language-voice-code": "es-MX",
    "language-voice-name": "es-MX-JorgeNeural",
    "language-display-name": "Spanish (Mexico)"

Logic App Flows:

  1. GetAVAMAccessToken: used by the other workflows. Gets an Azure AD management access token that is used to interact with Azure Video Analyzer for Media.
  2. UploadVideoToAVAM: triggered by video upload to Azure Storage. Generates a SAS URI and then passes it in the request to AVAM to begin processing the video.