/image_analysis_with_automl_in_azure

This repository provides sample codes, which enable you to learn how to use auto-ml image classification, or object detection under Azure ML(AML) environment.

Primary LanguageJupyter NotebookMIT LicenseMIT

a. Overview

This repository provides sample codes, which enable you to learn how to use auto-ml image classification, or object detection under Azure ML(AML) environment.

Target users

  • You want to classify your photos or find objects from your photos with your customized deep-learning models.
    • Please thihk about using Custom Vision for more simple development first.
  • You don't want to customize the algorithms for image analysis so much.
  • You want to obtain the inferred results with the deep-learning models at batch.
    • Please find some contents in references, if you're interested in real-time inference.

Disclaimer

  • This repository aims at minimum system development with some references. Major contents are quoted from them, and please check them if you're interested in more.
  • This repository was actually confirmed with some sample images as of July in 2022. Please regard it as your guideline in developping your application.

b. Prerequisites

  • Azure subscription, and its AML workspace
  • Image files to be classified

c. How to use

This repository is divided into training and inferring pipeline, and you can find that both environments are the same with respect to the AML environment perspective, i.e. both pipelines use the same compute_target, environment etc in AML.

So, you can easily merge them by sorting out implementation of Input/Output, if you prefer.

c.1 Azure environment and AML Workspace

c.2 Annotate for images and prepare datasets in AML

  • Decide which image analysis will be satisfied with your demand between image classification or object detection.2 image tasks
    • image classification is divided as two tasks: multi-class and multi-label.
      • multi-class: We can select only one class for each image, and some class must be selected. ex.) Morning, Noon, Evening, Night
      • multi-label: We can extract plural labels for each image, and none of the labels can be selected in some cases. ex.) Picture with dogs, cats and whales, but it doesn't contain any animals there.
  • Start data labelling with your image files under the instruction
  • Prepare config.ini under /common directory with the instruction

c.3 Populate pipelines in AML

  • Once completing the prep in c.2, please populate pipelines for training deep learning model with Auto-ML image classification with supported-model-algorithms. You may find the steps here
    • You use AML pipeline as batch execution like deep learning training or inference with this repository. In order to do it, you need train.py or inference.py, which will be embedded in the pipelines.

d. TIPS of the steps from technical point of view

d.1 Authentication

  • As a preparation, you need to use AML workspace, and use two kinds of authentication
    • az cli3 in 00. provisioning. Please check the site, if necessary.
      • You can find az login or az login --use-device-code with your preference.
    • Managed identity in 10. AML-pipeline_train and 20. AML_pipeline_inference
      • As usual authentication concept, you need three steps: populate managed ID, give access right to the populated ID, and retrieve AML workspace with the ID

      • Populate managed ID:

        • In the sample impelementation, you set up as an argument identity_type in the method AmlCompute.provisioning_configuration:

          compute_config = AmlCompute.provisioning_configuration(
              vm_size=vm_size,
              idle_seconds_before_scaledown=600,
              min_nodes=0,
              max_nodes=4,
              location=vm_location,
              identity_type=managed_id, ## Require `SystemAssigned` for System assigned managed ID here
           )

          By setting as above, you can use managed identity to retrieve AML workspace in executing actual batch pipelines in training of deep learning. Please see this page. You may make sure the populated managed ID in red-rectangle as follows:

          System Assigned identity.

      • Give access rights to the populated ID

        • After generating the identity, you need to assign the appropriate rights like READ or WRITE(IAM) in Azure AD like Enterprise Application setting. This site can help your understanding.
      • Retrieve AML workspace with the ID

        • You can retrieve AML workspace as follows in train.py and inference.py:
          from azureml.core.authentication import MsiAuthentication
          ## Authentication with managed identity
          msi_auth = MsiAuthentication()
          
          ## Retrieve Azure ML workspace
          ws = Workspace(subscription_id=subscription_id,
                          resource_group=resource_group,
                          workspace_name=workspace_name,
                          auth=msi_auth)

d.2 Selection of computer clusters

  • GPU instance in 10. AML-pipeline_train, and 20. AML-pipeline-inferrence
    • With GPU-instance in training with deep-learning model, you need specific VM series like NC-6 instead of NV-6.4
      compute_config = AmlCompute.provisioning_configuration(
          vm_size=vm_size,      # Specify `NC-` series as computer cluster here
          idle_seconds_before_scaledown=600,
          min_nodes=0,
          max_nodes=4,
          location=vm_location, # Make sure the location prepares the `vm_size`
          identity_type=managed_id,
      )

d.3 Populating python environment

  • You need to prepare python environment in executing the whole pipelines, and major functions to be delopped are as follows:

    • Ingest image files labelled by AML labelling tool
    • train deep-learning model with those files under GPU-cluster, and fine-tune automatically
    • Inferr with given image files and generated deep-learning models
  • In order to achieve under unified environment with automl in AML, this is a candidate for python environment setting5. You can change by adding more python libraries with your preferences.6

    aml_run_config.environment.python.conda_dependencies = CondaDependencies.create(
        python_version='3.7'
        ,conda_packages=['pandas'
                    ,'scikit-learn'
                    ,'numpy==1.20.1'
                    ,'pycocotools==2.0.2'
                    ]
        ,pip_packages=['azureml-sdk'
                    ,'azureml-automl-core'
                    ,'azureml-automl-dnn-vision==1.43.0'
                    ]
        ,pin_sdk_version=False)

Reference

Footnotes

  1. IF you're interested in more customized algorithms, please visit https://arxiv.org/list/cs.CV/recent

  2. This repository doesn't align with image segmentation.

  3. command line interface

  4. Please make sure the situation here. Indeed, you can choose NC-series in specific region.

  5. as of July 2022

  6. You can find pandas, scikit-learn, which are not used in this repository but are basic libraries to develop more functions. Please add more, if you need.