a. Overview

This repository provides sample codes, which enable you to learn how to use auto-ml image classification, or object detection under Azure ML(AML) environment.

Target users

You want to classify your photos or find objects from your photos with your customized deep-learning models.
- Please thihk about using Custom Vision for more simple development first.
You don't want to customize the algorithms for image analysis so much.
- This repository aims at the second-best strategy for simplicity¹, and auto machine learning technology provided by Microsoft is mainly used.
You want to obtain the inferred results with the deep-learning models at batch.
- Please find some contents in references, if you're interested in real-time inference.

Disclaimer

This repository aims at minimum system development with some references. Major contents are quoted from them, and please check them if you're interested in more.
This repository was actually confirmed with some sample images as of July in 2022. Please regard it as your guideline in developping your application.

b. Prerequisites

Azure subscription, and its AML workspace
Image files to be classified
- You can find 1 image file for testing inferring pipeline.

c. How to use

This repository is divided into training and inferring pipeline, and you can find that both environments are the same with respect to the AML environment perspective, i.e. both pipelines use the same compute_target, environment etc in AML.

So, you can easily merge them by sorting out implementation of Input/Output, if you prefer.

c.1 Azure environment and AML Workspace

Prepare Azure subscription, and AML workspace. You may find the steps here.

c.2 Annotate for images and prepare datasets in AML

Decide which image analysis will be satisfied with your demand between image classification or object detection.²
- image classification is divided as two tasks: multi-class and multi-label.
  - multi-class: We can select only one class for each image, and some class must be selected. ex.) Morning, Noon, Evening, Night
  - multi-label: We can extract plural labels for each image, and none of the labels can be selected in some cases. ex.) Picture with dogs, cats and whales, but it doesn't contain any animals there.
Start data labelling with your image files under the instruction
- Export the labed dataset into Dataset in AML. It will be used in training afterwards.
Prepare config.ini under /common directory with the instruction

c.3 Populate pipelines in AML

Once completing the prep in c.2, please populate pipelines for training deep learning model with Auto-ML image classification with supported-model-algorithms. You may find the steps here
- You use AML pipeline as batch execution like deep learning training or inference with this repository. In order to do it, you need train.py or inference.py, which will be embedded in the pipelines.

d. TIPS of the steps from technical point of view

d.1 Authentication

As a preparation, you need to use AML workspace, and use two kinds of authentication
- az cli³ in 00. provisioning. Please check the site, if necessary.
  - You can find az login or az login --use-device-code with your preference.
- Managed identity in 10. AML-pipeline_train and 20. AML_pipeline_inference
  - As usual authentication concept, you need three steps: populate managed ID, give access right to the populated ID, and retrieve AML workspace with the ID
  - Populate managed ID:
    - In the sample impelementation, you set up as an argument identity_type in the method AmlCompute.provisioning_configuration:
      compute_config = AmlCompute.provisioning_configuration( vm_size=vm_size, idle_seconds_before_scaledown=600, min_nodes=0, max_nodes=4, location=vm_location, identity_type=managed_id, ## Require `SystemAssigned` for System assigned managed ID here )
      By setting as above, you can use managed identity to retrieve AML workspace in executing actual batch pipelines in training of deep learning. Please see this page. You may make sure the populated managed ID in red-rectangle as follows:
      
      .
  - Give access rights to the populated ID
    - After generating the identity, you need to assign the appropriate rights like READ or WRITE(IAM) in Azure AD like Enterprise Application setting. This site can help your understanding.
  - Retrieve AML workspace with the ID
    - You can retrieve AML workspace as follows in train.py and inference.py:
      from azureml.core.authentication import MsiAuthentication ## Authentication with managed identity msi_auth = MsiAuthentication() ## Retrieve Azure ML workspace ws = Workspace(subscription_id=subscription_id, resource_group=resource_group, workspace_name=workspace_name, auth=msi_auth)

d.2 Selection of computer clusters

GPU instance in 10. AML-pipeline_train, and 20. AML-pipeline-inferrence

With GPU-instance in training with deep-learning model, you need specific VM series like NC-6 instead of NV-6.⁴

compute_config = AmlCompute.provisioning_configuration(
    vm_size=vm_size,      # Specify `NC-` series as computer cluster here
    idle_seconds_before_scaledown=600,
    min_nodes=0,
    max_nodes=4,
    location=vm_location, # Make sure the location prepares the `vm_size`
    identity_type=managed_id,
)

d.3 Populating python environment

You need to prepare python environment in executing the whole pipelines, and major functions to be delopped are as follows:
- Ingest image files labelled by AML labelling tool
- train deep-learning model with those files under GPU-cluster, and fine-tune automatically
- Inferr with given image files and generated deep-learning models

In order to achieve under unified environment with automl in AML, this is a candidate for python environment setting⁵. You can change by adding more python libraries with your preferences.⁶

aml_run_config.environment.python.conda_dependencies = CondaDependencies.create(
    python_version='3.7'
    ,conda_packages=['pandas'
                ,'scikit-learn'
                ,'numpy==1.20.1'
                ,'pycocotools==2.0.2'
                ]
    ,pip_packages=['azureml-sdk'
                ,'azureml-automl-core'
                ,'azureml-automl-dnn-vision==1.43.0'
                ]
    ,pin_sdk_version=False)

Reference

Typical use cases for image classification with AutoML in Azure
- These use cases have similar ways for training/inferencing. Especially, inferencing is implemented as real-time manner:
- If you're interested in batch-inferencing, please refer this use case, where it doesn't have explicit method to "predict" with given image data. By contrast, we have explicit way to predict.
Introduction for AutoML for images
- Announcing Automated ML (AutoML) for Images

IF you're interested in more customized algorithms, please visit https://arxiv.org/list/cs.CV/recent ↩
This repository doesn't align with image segmentation. ↩
command line interface ↩
Please make sure the situation here. Indeed, you can choose NC-series in specific region. ↩
as of July 2022 ↩
You can find pandas, scikit-learn, which are not used in this repository but are basic libraries to develop more functions. Please add more, if you need. ↩

kyoro1/image_analysis_with_automl_in_azure