This repository contains a set of Text Analytics examples and challenges for practicing usage of Azure Cognitive Services and Azure Search.
- Speech-to-Text - Convert audio data (
wav
) into written text - Index unstructured data - Make unstructured data and semi-structured data searchable (pdfs, images, csv, json, etc.)
- Convert images to text - Perform OCR and handwriting recognition on image files in order to extract text
- Text Analytics - Extract the language, sentiment, key phrases, and entities from text
- Language Understanding - Extract the intent and entities from written text
🚩 Goal: Convert wav
files to written text
In the language of your choice (Python solution is provided), write a small scripts that
- Converts speech into written text (German or English) - use can use this file
❓ Questions:
- What happens if you transcribe a long audio file with the Speech-to-Text API (>15s)? What does the provided solution to sentences?
- What happens if you select the wrong language in the text-to-speech API? How could you solve this problem?
🙈 Hints
🚩 Goal: Deploy an Azure Search instance and index a PDF-based data set
- Deploy an Azure Search instance
- Index the unstructured PDF data set from here - which document contains the term
Content Moderator
?
❓ Questions:
- What is an Index? What is an Indexer? What is a Data Source? How do they relate to each other?
- How would you index
json
documents sitting in Azure Blob? - Why would you want to use replicas? Why would you want more partitions?
🚩 Goal: Index an unstructured data set with Cognitive Search
- Add another index to the Azure Search instance, but this time enable Cognitive Search
- Index an existing data set coming from
Azure Blob
(data set can be downloaded here) - which document contains the termPin to Dashboard
?
❓ Questions:
- Let's assume we've built a Machine Learning model that can detect suspicious activities in text - how could we leverage this model directly in Azure Search for tagging our data?
🙈 Hints
🚩 Goal: Leverage OCR to make a hand-written or printed text document in images machine-readable
In the language of your choice (Python solution is provided), write two small scripts that
- Convert hand-written text from an image into text - Test data: 1, 2
- Convert printed text from an image into text - Test data: 1, 2
❓ Questions:
- How well does the OCR service work with German text? How well with English?
- What happens when the image is not oriented correctly?
🙈 Hints
🚩 Goal: Leverage Text Analytics API for extracting language, sentiment, key phrases, and entities from text
In the language of your choice (Python solution is provided), write a small scripts that
- Extracts sentiment, key phrases and entities from unstructured text using the Text Analytics API
❓ Questions:
- What happens if we do not pass in the
language
parameter while getting the sentiment?
🙈 Hints
🚩 Goal: Make your application understand the meaning of text
In the language of your choice (Python solution is provided), write a small scripts or apps that
- Detect the intent and entities of the text (German) - see examples below (using https://eu.luis.ai)
Let's use an example where we want to detect a Pizza order from the user. We also want to detect if the user wants to cancel an order.
LUIS example data:
2 Intents: "CreateOrder", "CancelOrder"
Utterances:
(CreateOrder) Ich moechte eine Pizza Salami bestellen
(CreateOrder) Vier Pizza Hawaii bitte
(CancelOrder) Bitte Bestellung 123 stornieren
(CancelOrder) Cancel bitte Bestellung 42
(CancelOrder) Ich will Order 933 nicht mehr
(None) Wieviel Uhr ist heute?
(None) Wie ist das Wetter in Berlin?
(None) Bitte Termin fuer Montag einstellen
❓ Questions:
- Why do we need to fill the
None
intent with examples? - What is the
Review endpoint utterances
feature in LUIS?
🙈 Hints