Process Automation: Speech to Text and Summarization with ACA

This sample creates a web-based app that allows workers at a company called Contoso Manufacturing to report issues via text or speech. Audio input is translated to text and then summarized to hightlight important information and specifiy the department the report should be sent to.

Features
Azure account requirements
Opening the project
Deployment
Local Development
- Explore the prompty file
- Testing the sample
Costs
Security Guidelines
Resources
Code of Conduct

Features

This project template provides the following features:

Azure AI Speech Service to translate the users speech into text.
Azure OpenAI to summarize the text
Prompty and Prompt Flow to create, manage and evaluate the prompt into our code.

Azure account requirements

In order to deploy and run this example, you'll need:

Azure account. If you're new to Azure, get an Azure account for free and you'll get some free Azure credits to get started. See guide to deploying with the free trial.
Azure subscription with access enabled for the Azure OpenAI service. You can request access with this form. If your access request to Azure OpenAI service doesn't match the acceptance criteria, you can use OpenAI public API instead.
- Ability to deploy gpt-35-turbo
- We recommend using Sweden Central or East US 2
Azure subscription with access enabled for Azure AI Speech Service

Opening the project

You have a few options for setting up this project. The easiest way to get started is GitHub Codespaces, since it will setup all the tools for you, but you can also set it up locally.

GitHub Codespaces

You can run this template virtually by using GitHub Codespaces. The button will open a web-based VS Code instance in your browser:
Open a terminal window.
Sign in to your Azure account:
```
azd auth login
```
Provision the resources and deploy the code:
```
azd up
```
This project uses gpt-3.5-turbo which may not be available in all Azure regions. Check for up-to-date region availability and select a region during deployment accordingly.

VS Code Dev Containers

A related option is VS Code Dev Containers, which will open the project in your local VS Code using the Dev Containers extension:

Start Docker Desktop (install it if not already installed)
Open the project:
In the VS Code window that opens, once the project files show up (this may take several minutes), open a terminal window.

Local environment

Prerequisites

Initializing the project

Create a new folder and switch to it in the terminal, then run this command to download the project code:
```
azd init -t summarization-openai-python-promptflow
```
Note that this command will initialize a git repository, so you do not need to clone this repository.
Install required packages:

cd src/summarizationapp
pip install -r requirements.txt

Deployment

Once you've opened the project in Codespaces, Dev Containers, or locally, you can deploy it to Azure.

Sign in to your Azure account:
```
azd auth login
```
If you have any issues with that command, you may also want to try azd auth login --use-device-code.
Create a new azd environment:
```
azd env new
```
This will create a folder under .azure/ in your project to store the configuration for this deployment. You may have multiple azd environments if desired.
Provision the resources and deploy the code:
```
azd up
```
This project uses gpt-3.5-turbo which may not be available in all Azure regions. Check for up-to-date region availability and select a region during deployment accordingly.
A .env file should have been created in the src folder. Move this file into the summarizationapp folder. This will contain all the environment variables you need.

Local Development

Explore the prompty file

This sample repository contains a summarize prompty file you can explore. In this sample we are telling the model to summarize the reports given by a worker in a specific format.

The prompty file contains the following:

The name, description and authors of the prompt
configuration: Details about the LLM model including:
- api type: chat or completion
- configuration: connection type (azure_openai or openai) and environment variables
- model parametes: max_tokesn, temperature and response_format (text or json_object)
inputs: the content input from the user, where each input should have a type and can also have a default value
outputs: where the output should have a type like string
Sample Section: a sample of the inputs to be provided
The prompt: in this sample we send add a system message as the prompt with context and details about the format. We also add in a user message at the bottom of the file, which consists of the reported issue in text format from our user.

If you ran the provisioning step above correctly, all of the variables should already be set for you. You can edit the prompt to see what changes this makes to the summary created.

Testing the sample

This repository contains sample data to be able to test the project end to end. To run this project you'll need to pass in as input a reported issue to be summarized. You can pass this input as either a .wav file or a string of text. The data/audio-data/ folder contains sample audio files for you to use or you can use the example string shown below. Below are the commands you can use in your terminal to run the project locally with promptflow.

Testing with sample audio data:

pf flow test --flow ./src/summarizationapp --inputs problem="data/audio-data/issue0.wav"

Testing with sample text data:

pf flow test --flow ./src/summarizationapp --inputs problem="I need to open a problem report for part number ABC123. The brake rotor is overheating causing glazing on the pads. We track temperature above 24 degrees Celsius and we are seeing this after three to four laps during runs when the driver is braking late and aggressively into corners. The issue severity is to be prioritized as a 2. This is impacting the front brake assembly EFG234"

To understand how the code works look through the speech_to_text.py file.

Run the local server:

python -m flask --debug --app src/app:app run --port 5000

Click 'http://127.0.0.1:5000' in the terminal, which should open a new tab in the browser.
Try the API at '/get_response' and try passing in a parameter at the end of the URL, like '/get_response?problem="string"'.

Costs

Pricing may vary per region and usage. Exact costs cannot be estimated. You may try the Azure pricing calculator for the resources below:

Azure Container Apps: Pay-as-you-go tier. Costs based on vCPU and memory used. Pricing
Azure OpenAI: Standard tier, GPT and Ada models. Pricing per 1K tokens used, and at least 1K tokens are used per question. Pricing
Azure Monitor: Pay-as-you-go tier. Costs based on data ingested. Pricing

Security Guidelines

This template uses Managed Identity for authenticating to the Azure services used (Azure OpenAI, Azure PostgreSQL Flexible Server).

Additionally, we have added a GitHub Action that scans the infrastructure-as-code files and generates a report containing any detected issues. To ensure continued best practices in your own repository, we recommend that anyone creating solutions based on our templates ensure that the Github secret scanning setting is enabled.

Resources

Code of Conduct

This project has adopted the Microsoft Open Source Code of Conduct.

Resources:

Microsoft Open Source Code of Conduct
Microsoft Code of Conduct FAQ
Contact opencode@microsoft.com with questions or concerns

For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

LianwMS/summarization-openai-python-promptflow