title | markmap | ||
---|---|---|---|
AI-102: Azure AI Engineer Associate |
|
- AI-102: Azure AI Engineer Associate
- Course: Designing and Implementing a Microsoft Azure AI Solution
- Study guide
- Practice assessment
- Exam prep videos
- Azure AI Hub
- Exam Sandbox: Experience the look and feel of the exam interface before taking it.
- Processes images and videos to understand their content
- Detects and recognizes human faces
- Builds and deploys custom image classification models
- Extracts text, key-value pairs, and tables from documents
- Extracts insights from videos and live streams
- Custom text classification
- Custom named entity recognition
- Conversational Language Understanding
- Entity Linking
- Key Phrase Extraction
- Language Detection
- Named Entity Recognition (NER)
- Orchestration workflow
- Personally identifiable information (PII) detection
- Question Answering
- Sentiment Analysis
- Summarization
- Text Analytics for Health
- Supports intermediate results, end-of-speech detection, automatic text formatting, profanity masking, and includes real-time speech-to-text and batch transcription
- Identifies the spoken language in a given audio stream
- Converts text to natural-sounding speech
- Identifies and verifies the people speaking based on audio
- Evaluates the pronunciation and provides feedback on the accuracy and fluency of the speech
- Translates streaming audio in real-time and provides result as text/synthesized speech
- Derives user intents from transcribed speech and act on voice commands
- ==TODO==
- Document analysis model
- Prebuilt model
- Custom model
- Azure AI Search
- Fairness: AI systems should treat all people fairly.
- Fairlearn: An Open-source toolkit for assessing and improving the fairness of machine learning models.
- Reliability and safety: AI systems should perform reliably and safely.
- Test the model
- Risks and harm related information should be accessible from the model users
- Privacy and security: AI systems should respect privacy and maintain security.
- Personally identifiable information (PII) should be protected
- Inclusiveness: AI systems should empower everyone and engage people.
- Transparency: AI systems should be transparent and understandable.
- Interpretability/Intellegibility: The ability to explain the results of a model in a way that is understandable to humans.
- Accountability: AI systems should be accountable to people.
- Model governance: The process of managing the entire lifecycle of a model, including model creation, deployment, and monitoring.
- Organizational principles: Define the roles and responsibilities of the people involved in the model lifecycle.
- From the Azure portal
- Using Azure CLI
- Using client librariy
- Using ARM templates, Bicep, or Terraform
- Multi-service resource
- Multiple Azure AI resources with a single key and endpoint
- Consolidate billing from the the services you use
- Single-service resource
- Single Azure AI resource with a single key and endpoint
- Use free tier for testing and development: only supported in single-service resources
- Endpoint URI is one of the three primary parameters for Azure AI
- 2 (access) keys are provided for each Azure AI resource by default
- Protect the keys by using Azure Key Vault
- Authenticate with:
- Single or multi-service key
- Token (REST API)
- Entra ID identity
- ==Azure Container Instances== (ACI): on demand standalones containers with minimal setup in serverless environment.
- ==Azure Kubernetes Service== (AKS): Managed Kubernetes service for deploying, managing, and scaling containerized applications using Kubernetes.
- Enable diagnostic logging for an Azure AI resource:
- ==Log Analytics Workspace== to analyze logs and metrics (Azure Monitor)
- ==Event Hub== for streaming logs to other services
- ==Storage Account== for archiving logs with less expensive storage
- Metrics: capture regular data points about the behavior of the resource in time-series database
- Alerts: notify you when a metric breaches a threshold
- Diagnostics settings: configure the resource to send logs and metrics to a destination
- Activity logs: records operations made on the resource
- Azure Pricing Calculator
- Estimate the cost of Azure services
- Azure Cost Management and Billing
- Monitor and analyze costs
- Create budgets and alerts
- Optimize costs
- Billing administrative tasks
- ==TODO==
- ==TODO==
- ==TODO==
- ==TODO==
-
Detect and filter harmful or inappropriate text content in applications
- Get an API endpoint + subscription key
- Send a request to the endpoint with the subscription key and the text to analyze
- Get a response with the classification of the text as JSON
- Harm categories (e.g. hate and fairness, sexual, violence, self-harm)
- Severity level from 0 to 7 (e.g. safe, low, medium, high)
-
Detect and filter harmful or inappropriate images in applications
- Get an API endpoint + subscription key
- Send a request to the endpoint with the subscription key and the image to analyze
- Get a response with the classification of the image as JSON
- Harm categories (e.g. hate and fairness, sexual, violence, self-harm)
- Severity level from 0 to 7 (e.g. safe, low, medium, high)
- Create Azure AI custom vision training and prediction resources.
TODO:
TODO:
- ==TODO==
- Azure AI vision can extract text from images and handwritten text
- ==OCR for images (version 4.0)==
- Inputs: Images: General, in-the-wild images
- Examples: labels, street signs, and posters
- Optimized for general, non-document images with a performance-enhanced synchronous API that makes it easier to embed OCR in your user experience scenarios.
- ==Document Intelligence read model==
- Inputs:Documents: Digital and scanned, including images
- Examples: books, articles, and reports
- Optimized for text-heavy scanned and digital documents with an asynchronous API to help automate intelligent document processing at scale.
- ==OCR for images (version 4.0)==
- ==TODO==
- Image classification: Classify or assign a label to an image
- Object detection: Identify and locate objects in an image
- You can upload and tag your images to train the classifier or detector model.
- For both image classification and object detection, you need to:
- Create a new project
- Name and describe it
- Select a project type: Classification or Object Detection
- Select an available domain (General, Food, Landmarks, Retail, Logo etc.)
- Train and test the model
- Publish and consume the model
- For image classification you need to select either:
- Multilabel classification: Assign multiple labels to an image
- Multiclass classification: Assign a single label to an image
- Select
train
button to start training the model - The training process can take a few minutes to a few hours
- Monitor the training process and check the metrics via the performance tab
- Delete obsolete iterations
- Available metrics:
- Precision
- A percentage value that indicates the proportion of true positive predictions in the total number of positive predictions.
- Recall
- A percentage value that indicates the proportion of true positive predictions in the total number of actual positive instances.
- mAP (mean Average Precision) - Object Detection only
- A metric that evaluates the precision-recall curve for object detection models.
- Precision
- Additional metrics:
- Probability threshold: The level of confidence that a prediction needs to have in order to be considered correct (for the purposes of calculating precision and recall)
- Overlapping threshold: Sets the minimum allowed overlap between the predicted object's bounding box and the actual user-entered bounding box. If the bounding boxes don't overlap to this degree, the prediction won't be considered correct.
- Make your model available for consumption by others by publishing it.
- Select the Publish
✓
button - Provide the model name and prediction resource
- Select the
Publish
button
- Select the Publish
- ==TODO==
- Analyze video content to extract topics, labels, named-entities, emotions, and scenes.
- A timeline is provided to navigate through the video content along with the dialogue and speaker identification.
- People counting
- Entrance and exit counting
- Social distancing and face/mask detection
- Identify the main points in a text
- Create an Azure AI language resource
- Get the endpoint and subscription key
- Send a request to the endpoint with the subscription key and the raw text to analyze
- Get a response with the key phrases as JSON: stream or store locally.
- 3 consumption ways:
- Language Studio
- REST API
- Docker container
- Entity linking: identify and disambiguate entities in text.
- Different endpoint for entity linking.
- Named entity recognition: identify and classify named entities in text.
- Ex: person, location, organization, date, etc.
- Evaluate text and returns sentiment scores and labels for each sentence
- Sentiment analysis: Provides sentiment labels (such as "negative", "neutral" and
"positive") based on the highest confidence score found by the service at a sentence and
document-level.
- This feature also returns confidence scores between 0 and 1 for each document & sentences within it for positive, neutral and negative sentiment.
- Opinion mining: Also known as aspect-based sentiment analysis in Natural Language Processing
(NLP).
- this feature provides more granular information about the opinions related to words (such as the attributes of products or services) in text.
- Sentiment analysis: Provides sentiment labels (such as "negative", "neutral" and
"positive") based on the highest confidence score found by the service at a sentence and
document-level.
- Evaluates a text and returns scored language identifiers.
- Large panel of languages supported including regional dialects.
- In case of mixed languages, the service will return the most used language with a low confidence score
- Identify, categorize and redact sensitive information in unstructured text.
- Create an Azure AI language resource
- Get the endpoint and subscription key
- Send a request to the endpoint with the subscription key and the raw text to analyze
- Get a response with the key phrases as JSON: stream or store locally.
- API is stateless in synchronous mode and available for 24h in asynchronous mode.
- Life-like speech synthesis (fluid and natural-sounding)
- Customizable voices
- Fined-grained audio controls (rate, pitch, pause, pronunciation etc.)
- Flexible deployment (cloud or containers)
- Real-time transcription of audio streams into written text by using SSML
(Speech Synthesis Markup Language).
- High quality transcription
- Flexible deployment
- Customizable models
- Production-ready
- SSML can be used to fine-tune text-to-speech models outputs.
- SSML is a markup language that allows developers to control various aspects of speech synthesis, such as pronunciation, volume, pitch, rate, and more.
- Custom neural voice (CNV) models can be used to create custom voices for text-to-speech
applications.
- CNV models are trained on a speaker's voice data to create a custom voice that can be used in text-to-speech applications.
- Test custom speech solutions for Word Error Rate (WER) with accuracy testing and custom acoustic
models:
- Needs improvement: >30%
- Acceptable: ~20%
- Ready for production: <10%
Taking a written or spoken input and determining the intent behind it.
- 2 methods:
- Pattern matching: for offline solutions
- Create code and speech configuration
- Initialize the intent recognizer and declare entities as intent
- Enable recognition of intent
- Instruct code to stop on intent recognition
- Display results
- Publish
- CLU (Conversational Language Understanding): prediction of intents
- Create a new project by importing a JSON file
- Train model
- Choose training mode and data splitting
- Deploy model
- Use model to recognize intents from an audio stream
- Pattern matching: for offline solutions
-
Detect word or short phrase within audio stream or content
- Create a new project in speech studio
- Create a custom keyword:
- Create new model
- Provide name/description and the keyword
- Validate
- Select a model type and Create
- Basic: rapid prototyping
- Advanced: improved accuracy characteristics for product integration
- Select Tune to download the model
- This model can now be used
- 2 distinct types of endpoints enable:
- Text translation: Translate text between languages (real-time)
- REST API cloud-based translator
- Docker container based translator
- Supported methods:
- Languages: Returns a list of languages supported by Translate, Transliterate, and Dictionary
Lookup operations. This request doesn't require authentication; just copy and paste the
following GET request into your favorite REST API tool or browser:
https://api.cognitive.microsofttranslator.com/languages?api-version=3.0
- Translate: Renders single source-language text to multiple target-language texts with a single request.
- Transliterate: Converts characters or letters of a source language to the corresponding characters or letters of a target language.
- Detect: Returns the source code language code and a boolean variable denoting whether the detected language is supported for text translation and transliteration.
- Dictionary lookup: Returns equivalent words for the source term in the target language.
- Dictionary example: Returns grammatical structure and context examples for the source term and target term pair.
- Languages: Returns a list of languages supported by Translate, Transliterate, and Dictionary
Lookup operations. This request doesn't require authentication; just copy and paste the
following GET request into your favorite REST API tool or browser:
- Document translation: Translate documents between languages (asynchronous)
- REST API cloud-based translator
- Client library SDK
- Supported methods:
- Translate large files: Translate whole documents asynchronously.
- Translate numerous files: Translate multiple files across all supported languages and dialects while preserving document structure and data format.
- Preserve source file presentation: Translate files while preserving the original layout and format.
- Apply custom translation: Translate documents using general and custom translation models.
- Apply custom glossaries: Translate documents using custom glossaries.
- Automatically detect document language: Let the Document Translation service determine the language of the document.
- Translate documents with content in multiple languages: Use the autodetect feature to translate documents with content in multiple languages into your target language.
- Text translation: Translate text between languages (real-time)
- Train a custom model:
- Select train model, enter sample data and select full training
- Select sample-source language, target language and review training costs
- Select Train now then Train to start training
- Once trained, select Model details to review the model
- Test and publish a custom model
- Select Test model, enter sample data
- Test (human evaluation) the translation
- Select Publish model to make the model available
- Select a region and validate.
- Speech-to-speech service can translate an audio stream/input to another language as an audio output.
- Works in real-time.
- 4 translation services for Speech-to-text:
- ==Speech translator API==
- Typically used for real-time translation of spoken languages
- ==Speech CLI==
- Experiment with minimal code solution
- ==Speech SDK==
- Use in your own applications
- ==Speech Studio==
- Typically used to test and tune speech services
- ==Speech translator API==
- Intent: action or goal expressed in a user's utterance
- Utterance: spoken or written phrases
- Word or phrase within utterances that can be identified and extracted
- Learned component: enables predictions based on context learned while labelling of utterances
- List component: Fixed ser of related words with their synonyms
- Prebuilt component: Built-in entities like date, time, number, etc.
- Regex component: Regular expression to match entities
- To create entities:
- Navigate to Entities pivot
- Select Add and type entity name
- Define composition settings
- Attach a Learned, Prebuilt or List component
- CLU can be used to build a custom natural language understanding model which predicts intention and extract information of utterances.
- Creation process:
- Select data and define schema
- Label data
- Train model
- View model performance results
- Tune the model
- Deploy
- Predict intents and entities
- Ensure training data set is representative and sufficient
- Insufficient data can lead to overfitting and lower accuracy
- Adding more labeled data can improve the accuracy of the model
- Ensure all entities are covered in test data
- Absence of labeled instance can reduce accuracy of model evaluation
- Ensure all entities are covered in the test data
- Fix unclear or ambiguous distinction between intents and entities
- Similar data for different intents can lead to confusion
- You can solve this by merging similar entities or adding more examples
- Azure AI language models can be consumed from a client application using the REST API or SDKs.
- This enables users to use natural language as input to interact with the application.
- User's intent and entities are extracted and processed by the model to provide the desired output.
- Application performs the necessary actions.
- Export replicas of language understanding models to backup and recover them in case of data loss.
- Export
- Create a
POST
request withOcp-Apim-Subscription-Key
to create export task - Use
GET
request to get a status of the export task - Use
GET
request to download the exported model
- Create a
- Import
- Create a
POST
request withOcp-Apim-Subscription-Key
to create import task- Body should contain the exported model as JSON
- Use
GET
request to get a status of the import task - Wait for successful completion of the task
- Create a
- Export
- Enable custom question answering
- Create a new project with a name and a language
- Add question-answer pairs from source URLs or manually
- In this case, you need to type the question and the answer manually.
- Use different sources to populate Azure Question Answering project:
- Structured documents (manuals, guidelines, etc.)
- Questions will be derived from the headings and subheadings of the document
- Answers will be derived from the subsequent text
- Unstructured documents (articles, blogs, etc.)
- Question-and-answer documents (FAQs, etc.)
.docx
,.pdf
,.txt
,.html
,.tsv
,.csv
...
- Structured documents (manuals, guidelines, etc.)
- In the knowledge base, source documents are imported as Questions. You can amend the questions and answers as needed.
- Select Save and train, then Test
- A test version of the knowledge base is created and you can analyze it with the Inspect button
- You can Publish the knowledge base to make it available for consumption through REST endpoint
- Multi-turn conversations are dialogues between a user and a bot that require multiple steps to complete.
- To create:
- Select Add follow-up prompts in the knowledge base
- Fill details of the prompt
- Create link to new pair
- Save
- Multiple follow-up prompts can be added to a single question by repeating the same process.
- Add alternate questions with differences in the sentence structure or wording to improve the accuracy of the model.
- Chit-chat is a feature that allows the bot to engage in casual conversation with the user.
- Provide bot the ability to answer question in a way that fits your brand
- Set a personality for the bot
- Automatically add simple question-answer pairs to the knowledge base
- Exporting a knowledge base allows you to save a copy of the knowledge base for:
- Backup purpose
- CI/CD integration
- Deployment region mobility
- Steps:
- Open the custom question answering project
- Select Export
- Select the export format (
.xlsx
or.tsv
) that will be exported in a.zip
file
- Multi-language question answering solutions can be created by training the model with data in
multiple languages.
- The model can be trained with data in multiple languages to support multi-language question answering solutions.
- Steps:
- When creating the new custom question answering project:
-
I want to select the language when I create a project in this resource
- Enter basic information and create the project
- Add sources to deploy the project
- Azure Cognitive Search (formerly known as “Azure Search”) is a cloud search service that gives developers infrastructure, APIs, and tools for building a rich search experience over private, heterogeneous content in web, mobile, and enterprise applications.
- On the search service itself, the two primary workloads are indexing and querying.
- Indexing engine
- Intake process that loads content into your search service and makes it searchable.
- Internally, inbound text is processed into tokens and stored in inverted indexes, and inbound vectors are stored in vector indexes.
- The document format that Azure AI Search can index is JSON. You can upload JSON documents that you've assembled, or use an indexer to retrieve and serialize your data into JSON.
- Applied AI through a skillset extends indexing with image and language models.
- If you have images or large unstructured text in source document, you can attach skills that perform OCR, describe images, infer structure, translate text and more.
- You can also attach skills that perform data chunking and vectorization.
- Query engine is used when your client app sends query requests to a search service
and handles responses. All query execution is over a search index that you control.
- Semantic ranking is an extension of query execution. It adds secondary ranking, using language understanding to reevaluate a result set, promoting the most semantically relevant results to the top.
- Indexing engine
- Azure AI Search can index content from a variety of data sources:
- Azure Storage (Blobs, Tables)
- Azure Cosmos DB
- Azure SQL Database, managed instance or SQL server
- Both push and pull methods are supported.
- An index is a collection of JSON objects with unique keys and one or more fields.
- Index attributes can be:
- Searchable: Full-text search
- Filterable
- Facetable: Used for aggregations/categorization and hit count
- Sortable
- Retrievable: Enables the field to be returned in search results or hidden from them.
- A skillset is a reusable object in Azure AI Search that's attached to an indexer.
- Contains one or more skills that call built-in AI or external custom processing over documents retrieved from an external data source.
- Steps:
- Document Cracking
- Field mappings
- Skillset execution
- Output field mappings
- Push to index
- Up to 30 skills per skillset
- Can repeat skills
- Support chained operations, looping and branching
- An AI enrichment pipeline can include both built-in skills and custom skills that you personally create and publish.
- Your custom code executes externally from the search service (for example, as an Azure function), but accepts inputs and sends outputs to the skillset just like any other skill.
- Following data are required to setup a new custom skill in a skillset:
uri
httpMethod
(PUT or POST)httpHeaders
timeout
(default 30s)batchSize
: data records to send to the skill at once (1000 per default)degreeOfParallelism
: maximum number of concurrent requests for this endpoint (between 1 and 10, default 5)- For managed-identity connections:
resourceId
authResourceId
- An indexer definition consists of properties that uniquely identify the indexer, specify which data source and index to use, and provide other configuration options that influence run time behaviors, including whether the indexer runs on demand or on a schedule.
- Extracts and serializes data from a data source, passing it to a search service for data ingestion.
- Full text search semantics based on Lucene query syntax over the index.
- Simple Lucene Query Parser
- Full Lucene Query Syntax: for specialized query forms: wildcard, fuzzy search, proximity search, regular expressions.
- Queries are processed in 4 stages:
- Query parsing
- Lexical analysis
- Document retrieval
- Scoring
- Projection is a way to define the shape of the data that you want to retrieve from the index.
- Enriched documents are stored in the knowledge store.
- Useful for knowledge mining scenarios.
- Projections can be read from 3 types of sources:
- Files
- Objects
- Tables
- Azure AI Document Intelligence is a cloud service that uses machine learning to extract information from documents.
- Prebuilt models are trained on a wide range of document types and can extract information
from documents with minimal configuration:
- Receipts
- Invoices
- Business cards
- Identity documents
- Contracts
- Tax forms
- Vaccination cards
- and more...
- You can train custom models to classify and extract information from documents that are specific
to your organization.
- Custom extraction models can be trained to extract information from documents that are specific to your organization.
- Custom classification models can be trained to classify documents based on their content.
- Train, test, and publish a custom document intelligence model:
- Create a new project in Document Intelligence Studio
- Label data
- Train the model
- Test the model
- ==TODO==
- ==TODO==
- Create an Azure OpenAI resource to access the OpenAI API and use it to generate content:
- Identify subscription, resource group, region, and pricing tier
- Configure network security
- Confirm configuration to deploy the resource
- By CLI:
-
az cognitiveservices account create -n <resource-name> -g <resource-group> \n --subscription <subscription-id> --location <location> --kind OpenAI --sku <sku>
-
- Azure OpenAI provides access to a range of models that can be used to generate content:
- GPT-4: Newest model for natural language and code generation
- GPT-3.5: Natural language and code generation
- DALL-E: Image generation
- Embeddings: Similarity, text and code search etc.
- Deploy a model:
- Select subscription and OpenAI resource
- Create a new deployment:
- Select the model
- Add a deployment name
- Setting advanced features like content filtering, token rate limits, etc.
- By CLI:
-
az cognitiveservices account deployment create -n <model-name> -g <resource-group> \n --deployment-name <deployment-name> --model-name <model-name> \n --model-version <model-version> --model-format "OpenAI" \n --scale-settings-scale-type "Standard"
-
- You can submit prompt for multiple purposes:
- Classifying content
- Generating new content
- Transformation and translation
- Summarization
- Continuation
- Question answering
- Chat
- and more...
- Use prompt engineering to define precisly the code you want to generate:
- Define the problem
- Define the input
- Define the output
- Define the constraints
- Define the evaluation metric
- Break down complex problems into smaller, more manageable parts
- DALL-E is a model that can generate images from textual descriptions:
- Uses Neural network based model
- Uses Natural Language Processing (NLP) to understand the textual description
- Specify style and content to generate images with specific characteristics
- ==TODO==
- Use ==Chat Playground== to familiarize with model parameters to control the generative behavior, like:
- Deployments: Your deployment name that is associated with a specific model.
- Temperature: Controls randomness.
- Lowering the temperature means that the model produces more repetitive and deterministic responses.
- Increasing the temperature results in more unexpected or creative responses.
- Try adjusting temperature or Top P but not both.
- Max length (tokens): Set a limit on the number of tokens per model response.
- The API supports a maximum of 4096 tokens shared between the prompt (including system message, examples, message history, and user query) and the model's response. One token is roughly four characters for typical English text.
- Top probabilities Similar to temperature, this controls randomness but uses a different
method. Lowering Top P narrows the model's token selection to likelier tokens.
Increasing Top P lets the model choose from tokens with both high and low likelihood.
- Try adjusting temperature or Top P but not both.
- Multi-turn conversations Select the number of past messages to include in each new API request. This helps give the model context for new user queries. Setting this number to 10 results in five user queries and five system responses.
- Stop sequences Stop sequence make the model end its response at a desired point. The model response ends before the specified sequence, so it won't contain the stop sequence text. For GPT-35-Turbo, using <|im_end|> ensures that the model response doesn't generate a follow-up user query. You can include as many as four stop sequences.
- To improve generative AI responses, prompt engineering techniques can be used:
- Provide clear instructions
- Primary, supporting, and grounding content
- Providing cues
- Requesting output composition: length, style, formatting, etc.
- Using system messages
- Conversation history and few-shot learning
- Chain of thought
- You can use your own data with Azure OpenAI models to generate content that is specific to your
organization:
- Setup a data-source: such as blob storage
- Configure studio to connect to the data-source
- Use Azure OpenAI model per usual to generate content
- You can configure the model with specific parameters to control the generative behavior:
- Strictness determines the system's aggressiveness in filtering search documents based on their similarity scores.
- Retrieved documents is an integer that can be set to 3, 5, 10, or 20, and controls the number of document chunks provided to the large language model for formulating the final response.
- Limit responses attempts to only rely on your documents for responses.
- Fine-tuning an Azure OpenAI model allows you to customize the model to better suit your needs
- Fine-tuning is expensive and time-consuming, but reduces the need for many examples to achieve good performance