/Azure-Data-Platform

Short guide as to how to set up the ADF Project :)

Azure Data Platform/Analytics Platform

This project is here to provide an end-to-end data platform on Azure, with all of the required services in place and fully connected. This provides the empty shell of the solution, ready to "bring-your-own-data".

Pre-requisites:

  • An Azure Resource Group with Contributor Access (remember Azure inherits), Owner access is required for managed instance which is optional but discussed in this guide

How to set up the project

Step 1: Click Deploy To Azure below to get started

Deploy To Azure

Fill in the corresponding fields: Deploy a custom ARM Template on Azure

Click deploy and you should have this showing:

Deployment is in progress

This should take approximately 5 minutes, however that's solely due to there being a whole Virtual Machine deployment (IaaS), and you can continue to do the next steps as the Virtual Machine continues to deploy.

If you navigate to your newly created resource group or if you click Go To Deployment you should have all of these resources (the names may differ based on your input parameters upon deploying the template:

All the resources deployed

The final step looks like this:

image

Step 2: Configure the Azure Data Factory to use a Private Endpoint

  • Navigate to your Azure Data Factory that has been created in your resource group
  • Go to the networking tab on the left
  • Change the Networking Access from Public Endpoint to Private Endpoint

Private Endpoint Set Up for Azure Data Factory

  • Select Private endpoint connections at the top
  • Add a Private Endpoint by clicking +Private Endpoint

image

Add an endpoint:

Tab 1: image

Target your own resource in tab 2 image

Navigate to your Azure Data Factory and lunch it image

Note that you also want to make sure that this private endpoint is configured with the correct VNET, else Azure Data Factory will resolve your public endpoint.

Step 3: Managed VNET

By creating a managed VNET it allows us to access our PaaS resources on a private endpoint.

  1. Click the Manage option on the left menu
  2. Click Integration Runtimes
  3. Click New image

Choose Azure, Self-Hosted, then continue

image

Then click Azure and continue again

image

  1. Give it a good name, in this case I've named it managed-vnet
  2. Enable Virtual network configuration Then click create

image

  1. Click Manage Private Endpoints
  2. New

image

Choose Azure Data Lake Gen 2 (ADLS2) image

Give it a name, choose your Azure Subscription and your Storage Account that you made during the initial deployment stage

image

Go to your storage account and approve the private endpoint

image

Go back to Linked Services, and click new

image

Choose Azure Data Lake Storage Gen2 image

For Connect via integration runtime select from the dropdown your managed vnet that you just created in the previous step image

If successful this should show up and your private endpoint should be approved image

Test the connection

image

Now this approach is by using a key, if you have owner access to your subscription then you will be able to grant access via managed identity by configuring Access Control (IAM) and you can select Managed Identity in the steps above.

  1. Go to integration runtimes
  2. Click new

image

Choose Azure, Self-Hosted

image

Under Network Environment select Self-Hosted and then click continue

image

In this case I've named it self-hosted-vm

image

Follow either option 1 or 2, I did option 1 where I logged onto the VM and installed the application and used one of the keys

image

Now you can see that the status of the integration run time on my self-hosted VM is now running

image

You will need to use your VM in few steps, so don't log off yet

Add a new linked service again by going to linked services, new

image

Search for file and choose File System

image

Under Connect via integration runtime, choose your self-hosted-vm

image

On your VM, create a new folder on one of your drives

  1. Create a new file in that folder
  2. Copy the path of the folder

image

Under host, paste the file path

image

Enter your username and password that you set in the initial ARM resource template deployment stage

image

As best practice you can create a linked service to an Azure Key Vault

  1. Linked Services
  2. New

image

Select Key Vault

image

You would choose your Key Vault that you created during the deployment stage here This can then be used for secrets instead of inputting your password for the self-hosted VM in the previous stage

image

Go to your storage account and create a new container

image

Step 4: Creating connections for the pipeline

  1. Click on Pipelines in the side menu on the left
  2. Click the plus button
  3. Choose datasets

image

Choose Azure Data Lake Gen 2

image

For format choose DelimitedText/CSV

image

Choose your linked service for your Azure Storage Account

image

Next you can click on the browse button image

Choose your container that you previously created

image

For the import schema, click on None

image

Next do the same for your file server on the VM

image

Search for file and choose File System

image

Again choose DelimitedText/CSV

image

You can then choose your linked service for your file server

image

Browse for the file path

image

Select the file you created

image

Change the import schema to none

image

Step 5: Create a new pipeline

So all the stages before this are primarily just for set up. This is the most customisable step of the process, this will simply copy a file from your virtual machine to your container on Azure Blob Storage.

Firstly Drag Copy Data onto your canvas

image

At the bottom of the page

  1. Click Source
  2. Choose DelimitedText2 (the Virtual Machine Dataset)

image

  1. Click Sink
  2. Choose DelimitedText1 (the Storage Account Dataset)

image

Click Debug

image

It should then show the status as Succeeded

image

Complete! I hope you all enjoyed.

Feel free to create an issue on this GitHub page if you have found any problems setting it up or message me directly on LinkedIn or by emailing t-schish@microsoft.com