hashicorp/terraform-provider-azurerm

Cannot create azurerm_storage_container in azurerm_storage_account that uses network_rules

phil-bevan opened this issue ยท 105 comments

Community Note

  • Please vote on this issue by adding a ๐Ÿ‘ reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and AzureRM Provider) Version

Terraform v0.11.11

  • provider.azurerm v1.21.0

Affected Resource(s)

  • azurerm_storage_account
  • azurerm_storage_container

Terraform Configuration Files

resource "azurerm_storage_account" "test-storage-acct" {
  name                     = "${var.prefix}storacct"
  resource_group_name      = "${var.resgroup}"
  location                 = "${var.location}"
  account_tier             = "Standard"
  account_replication_type = "LRS"
  network_rules {
    ip_rules                   = ["aaa.bbb.ccc.ddd/ee"]
    virtual_network_subnet_ids = ["${var.subnetid}"]
  }
}
resource "azurerm_storage_container" "provisioning" {
  name                  = "${var.prefix}-provisioning"
  resource_group_name   = "${var.resgroup}"
  storage_account_name  = "${azurerm_storage_account.test-storage-acct.name}"
  container_access_type = "private"
}

Debug Output

  • azurerm_storage_container.provisioning: Error creating container "philtesting1-provisioning" in storage account "philtesting1storacct": storage: service returned error: StatusCode=403, ErrorCode=AuthorizationFailure, ErrorMessage=This request is not authorized to perform this operation.
    RequestId:a7f9d2e1-701e-00b3-4e74-cf3b34000000
    Time:2019-02-28T14:45:53.7885750Z, RequestInitiated=Thu, 28 Feb 2019 14:45:53 GMT, RequestId=a7f9d2e1-701e-00b3-4e74-cf3b34000000, API Version=, QueryParameterName=, QueryParameterValue=

Expected Behavior

Container can be created in a storage account that uses network rules

Actual Behavior

After applying a network_rule to a storage account I cannot provision a container into it. My public IP is included in the address range specified in the network rule. I can successfully create the container via the Azure portal

Steps to Reproduce

  1. terraform apply

Does reproduce also for v1.22.1

sschu commented

The reason is most likely that listing existing storage containers in a storage account directly accesses the storage accounts REST API. It fails if there are firewall rules in place not containing the ip of the host terraform runs on. It works if this IP is added. However, finding that IP is a challenge when terraform is run from Azure Devops as we do. This might not be easy to fix. Maybe storage account firewall rules should be their own resources that need to be added last in a deployment? Or creating a storage container resource first disables the firewall on the storage account and enables it afterwards?

We just ran into this ourselves. Nice to see someone else has already raised the issue with excellent documentation.

The workaround we are testing is to call out to an ARM template for creating the containers. This is not ideal for several reasons:

  1. It's not Terraform-native
  2. It's more moving parts and more complicated to manage
  3. ARM templates only apply once, so if the configuration drifts over time Terraform will not set it back

But it's what we've got. This could be a workaround for you if you need this.

I'm using two parts - a JSON file with the ARM, and a Terraform azurerm_template_deployment

storage-containers.json

{
    "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "storageAccountName": {
            "type": "string"
        },
        "location": {
            "type": "string"
        }
    },
    "resources": [
        {
            "name": "[parameters('storageAccountName')]",
            "type": "Microsoft.Storage/storageAccounts",
            "apiVersion": "2018-07-01",
            "location": "[parameters('location')]",
            "resources": [
                {
                    "name": "default/images",
                    "type": "blobServices/containers",
                    "apiVersion": "2018-07-01",
                    "dependsOn": [
                        "[parameters('storageAccountName')]"
                    ]
                },
                {
                    "name": "default/backups",
                    "type": "blobServices/containers",
                    "apiVersion": "2018-07-01",
                    "dependsOn": [
                        "[parameters('storageAccountName')]"
                    ]
                }
            ]
        }
    ]
}

main.tf

resource "azurerm_storage_account" "standard-storage" {
  name                = "stdstorage"
  location            = "${var.location}"
  resource_group_name = "${var.resource_group_name}"

  account_tier              = "Standard"
  account_replication_type  = "${var.standard_replication_type}"
  enable_blob_encryption    = "${var.standard_enable_blob_encryption}"
  enable_https_traffic_only = true

  network_rules {
    ip_rules                   = "${var.firewall_allow_ips}"
    virtual_network_subnet_ids = ["${var.vm_subnet_id}"]
  }
}

resource "azurerm_template_deployment" "stdstorage-containers" {
  name                = "stdstorage-containers"
  resource_group_name = "${var.resource_group_name}"
  deployment_mode     = "Incremental"

  depends_on = [
    "azurerm_storage_account.standard-storage",
  ]

  parameters {
    location           = "${var.location}"
    storageAccountName = "${azurerm_storage_account.standard-storage.name}"
  }

  template_body = "${file("${path.module}/storage-containers.json")}"
}

hi @tombuildsstuff ,
has this issue resolved ? is it related with PR#416

@ranokarno not yet, this issue's open/tracking this bug

I'm hitting a very similar issue except that I'm trying to create a Storage Queue. Otherwise it's very similar from a technical standpoint (using ADO for deployment too).

I hit this bug using terraform 0.12.17 with AzureRM provider 1.37.0 and 1.38.0.

@sschu I am also deploying from Azure DevOps hosted machines. The workaround I created was:

  • Prior to the terraform tasks, run an Azure PowerShell task that gets the hosts public IP and adds it to the network rules using 'Add-AzStorageAccountNetworkRule'
  • Run Terraform steps
  • Remove network rule using Remove-AzStorageAccountNetworkRule

It's workable, but still a pain

I believe I just encountered this with v2.3.0 of the Azure provider. Given that this is still open, I'm assuming it hasn't been fixed?

Hello,
It's seems related to this azure-cli issue: Azure/azure-cli#10190

Currently, the creation of a storage container resource (blob, share) seems to use the storage container API which is behind the firewall.
Instead, it should use the Resource Manager provider. In the issue mentionned above, I just discover that az cli has a az storage share-rm create in addition to existing az storage share create. I don't know if there is an equivalent for blob, and if this exists in the azure rest API or in terraform :)

Experiencing the same error with Terraform v0.12.25 and azurerm v2.9.0

resource "azurerm_storage_account" "sa_balancesheet_upload" {
  name                      = var.name
  resource_group_name       = var.resource_group_name
  location                  = var.location
  account_kind              = "StorageV2"
  account_tier              = "Standard"
  account_replication_type  = var.account_replication_type
  enable_https_traffic_only = var.enable_https_traffic_only

  network_rules {
    default_action = "Deny"
    ip_rules       = var.ip_rules
  }

  tags = {
    environment = var.environment
  }
}

resource "azurerm_storage_container" "sc_balancesheets" {
  name                  = "balancesheets"
  storage_account_name  = azurerm_storage_account.sa_balancesheet_upload.name
  container_access_type = "private"
}

When using Azure DevOps hosted agents to deploy, I ended up writing this piece of Powershell that invokes Azure CLI to allow that specific agent's public IP address to be allowed into the Storage Account that had IP restrictions enabled. Like @jeffadavidson

It's a script you can call as part of your deployments that will toggle the public IP of that agent either on or off (-mode switch).

As mentioned I use it for Azure DevOps pipeline deployments, but it could be used anywhere else by other deployment tools...

<#
.SYNOPSIS
Set (by mode: ON OFF) the Storage Account Firewall Rules by Public IP address. Used by Azure DevOps Build/Release agents
See here : https://github.com/terraform-providers/terraform-provider-azurerm/issues/2977
.DESCRIPTION
Using Azure CLI
.EXAMPLE
.\SetMode_PublicIPAddress_SA.ps1 -storageaccount sa12345random -resourcegroup RG-NDM-TEST -mode on
.NOTES
Written by Neil McAlister - March 2020
#>
param (
	[Parameter(Mandatory=$true)]
	[string]$storageaccount,
	[Parameter(Mandatory=$true)]
        [string]$resourcegroup,
        [Parameter(Mandatory=$true)]
	[string]$mode
)
#
$ip = Invoke-RestMethod http://ipinfo.io/json | Select -exp ip
write-host $ip
#
if ($mode -eq 'on') { 
az storage account network-rule add --resource-group $resourcegroup --account-name $storageaccount --ip-address $ip
} 
#
if ($mode -eq 'off') {
az storage account network-rule remove --resource-group $resourcegroup --account-name $storageaccount --ip-address $ip
}

I have this with as a step in my deployments with a -mode on that allows access to the SA

I also have another step at the end with -mode off Note that you should run the -mode off step even if your deployment fails/crashes out, otherwise your SA firewall rules are going to get messy with lots of orphaned IP addresses in it.

If you are using YAML based pipelines, that setting is...

condition: always()

...if using GUI based releases it is a setting under ADVANCED options

I have same problem. I cannot create container from my workstation. I enabled virtual_network_subnet_ids (for application backend) and ip_rules for my workstation (running TF scripts). I'm getting Status=403 Code="AuthorizationFailure"

Same error in terraform v0.2.12

hi, Is there any plan to fix this ?

Reiterating @boillodmanuel 's comment about using the resource manager API instead of the storage account API which is behind the firewall. This isn't just for the create request either. Terraform refreshing the storage container properties also fails if the network rules prevent it. There is a resource manager API available that can be used instead, eg using the Azure CLI tool:

โžœ  ~ az storage container-rm list --storage-account myStorageAccount
Command group 'storage container-rm' is in preview. It may be changed/removed in a future release.
[
  {
    "defaultEncryptionScope": "$account-encryption-key",
    "deleted": false,
    "deletedTime": null,
    "denyEncryptionScopeOverride": false,
    "etag": "\"0x8D8720723A5BBDF\"",
    "hasImmutabilityPolicy": false,
    "hasLegalHold": false,
    "id": "/subscriptions/<subscription id>/resourceGroups/my-resource-group/providers/Microsoft.Storage/storageAccounts/myStorageAccount/blobServices/default/containers/myContainer",
    "immutabilityPolicy": null,
    "lastModifiedTime": "2020-10-16T19:17:37+00:00",
    "leaseDuration": null,
    "leaseState": "Available",
    "leaseStatus": "Unlocked",
    "legalHold": null,
    "metadata": null,
    "name": "myStorageContainer",
    "publicAccess": "None",
    "remainingRetentionDays": 0,
    "resourceGroup": "my-resource-group",
    "type": "Microsoft.Storage/storageAccounts/blobServices/containers",
    "version": null
  }
]

This was run on a storage account/container that had network rules preventing me from accessing the storage container through the storage account API.

experiencing same issue doing the following:

export TF_VAR_ip_rules='["<MY_IP>"]'
Error: Error retrieving Azure Storage Account "myst": storage.AccountsClient#GetProperties: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code="StorageAccountNotFound" Message="The storage account myst was not found."
terraform {
  required_version = "= 0.14.9"

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = ">= 2.54"
    }
  }
}


resource "azurerm_storage_account" "st" {
  name                     = "${replace(var.name_prefix, "-", "")}${var.short_storage_name}st"
  resource_group_name      = var.resource_group_name
  location                 = var.location
  account_tier             = "Standard"
  account_replication_type = "GRS"
  min_tls_version          = "TLS1_2"
  is_hns_enabled           = var.is_hns_enabled

  network_rules {
    bypass = ["AzureServices"]
    default_action = "Deny"
    virtual_network_subnet_ids = var.virtual_network_subnet_ids
    ip_rules = var.ip_rules
  }

  tags = var.tags
}

Hi, I've had a read through #9314 and noted there was a dependency on an upstream Storage API change before being able to improve this behaviour in the azurerm terraform provider. Is there an update on how far those changes have progressed and when we expect the terraform provider to be able to make use of those upstream changes?

Does anyone know if there is an open TF bug for deploying a Premium tier Storage account with network rules?

Whenever I tried to deploy a premium tier SA using TF module it give a 403 error

This issue caught my eye while looking through SA related issues for a different issue. I've encountered similar issues to the OP. My troubleshooting revealed the following learning:

  • changes that are made to the network-access settings on an SA may not actually get applied to the SA for ~20 seconds.
  • MS hosted Azure DevOps agents will connect to an SA using their public IP address (via an LB) WHEN the agent is in a different region than the SA being configured.
  • MS hosted Azure DevOps agents will connect to an SA using their internal RFC1918 private IP address (10.x.x.x) (no LB) WHEN the agent is in the SAME region as the SA being configured.
  • RFC1918 network ranges cannot be added to SA network-access lists. They are not permitted to be placed in the configuration. The API prevents it.

The MS workaround for the above is to make sure you are using self-hosted Azure DevOps agents. A better fix imho is to make the MS hosted agents ALWAYS go via their public IP, even when the SA is the same Region as the agent.

A bonus fix from MS would be to allow Azure firewalls to recognise Azure DevOps agents as another trusted Azure Service.

In my case when I run a pipeline for second time I got the 403 error. It only work if I change the firewall rule to allow all networks.

Since this is only a problem of the container/filesystem resources, I am using an ARM template as a replacement for that. Code is quite simple:

resource "random_id" "randomId" {
  byte_length = 6
}

resource "azurerm_template_deployment" "container" {
  count               = var.account.file_systems
  depends_on          = [ azurerm_storage_account.account ]
  name                = "${azurerm_storage_account.account.name}-container-${random_id.randomId.hex}"
  resource_group_name = var.resource_group_name
  deployment_mode     = "Incremental"
  template_body       = file("${path.module}/container.json")
  parameters          = {
    storage_account_name = azurerm_storage_account.account.name
    container_name       = var.account.file_systems[count.index].container_name
  }
}

With a container.json file in the same folder:

{
    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "storage_account_name": {
            "type": "string"
        },
        "container_name": {
            "type": "string"
        }
    },
    "variables": {},
    "resources": [
        {
            "type": "Microsoft.Storage/storageAccounts/blobServices/containers",
            "name": "[concat(parameters('storage_account_name'), '/default/', parameters('container_name'))]",
            "apiVersion": "2021-02-01",
            "properties": {}
        }
    ]
}

@bergmeister I also converted into Arm deployment and that resolved the issue for me. But I hated this approach to be honest.

It's not great I admit but for most people good enough. The question is more like how often are container names really renamed in real life and features like ACLs aren't working that great in terraform yet anyway, so apart from having to clean up containers on renames, it's not too bad or too complex to maintain.

I had this same issue where I created a Premium SKU File Share with Terraform 1.0.2 on Azure, but when I locked it down to a VNET and Public IPs my Build Agents got 403 not authorized.
If I built locally from my workstation it would work, but even with the public IP of my self hosted build agents for Azure Devops, it still failed.

Then I found this mentioned on the resource documentation - https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/storage_account_network_rules#ip_rules

IP network rules have no effect on requests originating from the same Azure region as the storage account. Use Virtual network rules to allow same-region requests. Services deployed in the same region as the storage account use private Azure IP addresses for communication. Thus, you cannot restrict access to specific Azure services based on their public outbound IP address range.

So that means for me anyway, as my build agents were in the same region in Azure as the file share, they were getting the internal IP, not public. To fix I added the build VM vnet into the allowed virtual networks on the file share and now it works fine.

Is this still reproducible?

sschu commented

Most likely it is. Fundamental reason is the terraform provider uses a mixture of control plane and data plane to set things up. And operations on the data plane are affected by network rules set up in the control plane.

Is this still reproducible?

I just ran into this yesterday.

dtns2 commented

Is this still reproducible?

Yes, also just ran into this today.

The whole point of having an API to spin up resources in the cloud is to be able to do this from anywhere and the resources themselves are restricted. I am bewildered by the fact that it appears the Azure API to interact with storage shares are subject to the network restrictions of the storage account.

resource "azurerm_storage_account" "example" {
  resource_group_name       = azurerm_resource_group.example.name
  location                  = azurerm_resource_group.example.location
  name                      = "example"
  account_kind              = "FileStorage"
  account_tier              = "Premium"
  account_replication_type  = "LRS"
  enable_https_traffic_only = false
}

resource "azurerm_storage_account_network_rules" "example" {
  depends_on = [azurerm_storage_share.example]

  storage_account_id         = azurerm_storage_account.example.id
  default_action             = "Deny"
  virtual_network_subnet_ids = [azurerm_subnet.example.id]
  
  # AZURE LIMITATION:
  #   interactions with storage shares inside a storage account through the Azure API are subject to these restrictions?
  #   ...so all future executions of Terraform break if one doesn't poke oneself a hole for wherever we are running Terraform from
  // ip_rules = [chomp(data.http.myip.body)]
}
// data "http" "myip" {
//   url = "http://icanhazip.com"
// }

resource "azurerm_storage_share" "example" {
  name                 = "example-storage-share"
  storage_account_name = azurerm_storage_account.example.name
  enabled_protocol     = "NFS"
}

Otherwise:

โ”‚ Error: shares.Client#GetProperties: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationFailure" Message="This request is not authorized to perform this operation.\nRequestId:XXXXXXX-YYYY-YYYY-YYYY-ZZZZZZZZZZZZ\nTime:2021-11-10T18:07:08.8135873Z"
โ”‚ 
โ”‚   with azurerm_storage_share.example,
โ”‚   on storage.tf line 47, in resource "azurerm_storage_share" "example":
โ”‚   47: resource "azurerm_storage_share" "example" {

Resolution of PR #14220 will fix this.

@andyr8939 comment is the correct fix until #14220 is implemented.

This solution ("add the vnet to the network rules") worked for us

While adding the vnet to the network rules is a solution, this routes your traffic over the public internet, which is not as ideal as having completely private storage.

An alternative solution that I am investigating using is private endpoints.

When you create a private endpoint for a storage container, a private DNS zone for the storage is created and associated to a vnet where you put the endpoint. This allows the resources in that vnet to resolve the azure storage IP as a private IP, so connections from that vnet will traverse the microsoft backbone properly instead of going over the public internet.

This will go around any network rules you put on the storage because network rules only apply to the public IP, so you would create private endpoints for a vnet that needs access, and then you can either peer other vnets to that one, or create new endpoints for the other vnets to prevent sharing resources between the vnets.

In Terraform, this does require that you initially create an azurerm_storage_account resource without a network_rules block, then create an azurerm_private_dns_zone resource, an azurerm_private_dns_zone_virtual_network_link resource, an azurerm_private_endpoint resource, and then apply the network rules using the azurerm_storage_account_network_rules resource.

I went through our terraform storage code and refactored it to leverage private endpoints, and I removed the vnet from the network rules in the process of doing that in order to confirm it is really using the private endpoint.

It works beautifully but there are some caveats.

  • In order to use private endpoints, your subnet must not be enforcing private endpoint network rules. This is a simple true/false argument named enforce_private_link_endpoint_network_policies in the azurerm_subnet resource. Despite the name of the argument, it must be set to true in order to allow private endpoints to be created.
    • NOTE: There is a separate argument called enforce_private_link_service_network_policies which you do not need to change for this. Ensure you set the one with "endpoint" in the argument name if you are trying to create private endpoints for storage, event hubs, etc.
  • Additionally, in order to create a private endpoint, your storage account must already exist in order to provide the azurerm_private_endpoint resource with the resource ID of your azurerm_storage_account. This means you CANNOT define your network_rules block inside the azurerm_storage_account resource, but instead must create the storage account without network rules, then create a Private DNS Zone, followed by 1-2 private endpoints, followed by applying network rules via the azurerm_storage_account_network_rules resource, and finally creating your azurerm_storage_container.
    • 2 private endpoints is recommended in order to provide better performance. One endpoint connects to the primary subresource (storage container connection in this case, which for me is "blob") and one for the secondary subresource (the "blob_secondary" storage container connection for me).

I found some sample code here https://www.debugaftercoffee.com/blog/using-terraform-with-private-link-enabled-services and adapted it to my needs. Additional samples are found here: https://github.com/hashicorp/terraform-provider-azurerm/tree/main/examples/private-endpoint and I found that the example in the private-dns-group subdirectory of the second link was most helpful in getting the DNS zone group configured properly for my storage resources.

I hope this helps. Let me know if anyone has questions.

@tspearconquest thank you for the explanation, I wonder if you defined azurerm_private_endpoint, azurerm_storage_account (without network rules) and azurerm_storage_account_network_rules in on terraform run? and does terraform build its dependencies correct in this case or do you have to add depends_on statements manually?

Hi @TheKangaroo Yes they can all be defined in a single .tf file and created in a single run, but the network rules must be defined as separate resource from the storage account; meaning you can't include the network rules block in the storage account resource.

It should build the dependencies correctly based on the resource IDs being included, however I chose in my code to explicitly define them in order to make certain that diagnostic settings for my AKS cluster are not created until the storage container is created.

My diagnostic settings don't explicitly depend upon the storage, but rather we use a tool running in the cluster to extract the audit logs from an event hub and that tool itself is what requires the storage. So the implicit dependency is not known to Terraform and for that reason is why I chose to define the dependencies explicitly.

Providing another workaround based on the azapi provider:

resource "azapi_resource" "test" {
  name      = "acctestmgd"
  parent_id = "${azurerm_storage_account.test.id}/blobServices/default"
  type      = "Microsoft.Storage/storageAccounts/blobServices/containers@2021-04-01"
  body      = "{}"
}
VR99 commented

It should build the dependencies correctly based on the resource IDs being included, however I chose in my code to explicitly define them in order to make certain that diagnostic settings for my AKS cluster are not created until the storage container is created.

My diagnostic settings don't explicitly depend upon the storage, but rather we use a tool running in the cluster to extract the audit logs from an event hub and that tool itself is what requires the storage. So the implicit dependency is not known to Terraform and for that reason is why I chose to define the dependencies explicitly.

Hello tspearconquest,

I tried the approach you recommended, but am running into issues i.e. the private endpoint is not being utilized, can you please validate and let me know if am missing anything:

  1. Created the following resources for the self-hosted build agent in a separate resource group - azurerm_virtual_network, azurerm_subnet, azurerm_private_dns_zone, azurerm_private_dns_zone_virtual_network_link.
  2. Created the datalake and related resources in a different resource group and added the azurerm_private_endpoint resource with the subnet_id, private_dns_zone_group pointing to the resources created for the build agent. Pointed the private_service_connection private_connection_resource_id to the datalake storage account id.

The issue is that on re-running the terraform apply for the datalake, it is unable to access the containers (getting the 403 error) i.e. it does not seem to be using the private endpoint created using the process above.

Hi @VR99

Yes that sounds like the correct steps. I'm afraid I don't have experience with Data Lake, as we only use Event Hubs and Log Analytics. Can you share some of the non working code? Maybe I can spot something in there.

I did have an issue with connecting to the storage from my corporate VPN when I was testing this all out. It turns out our VPN was missing a peering, so we got that added and I was able to connect by IP. Unfortunately, the network team was hesitant to setup any private DNS endpoints on the VPN subnet, so even with the peering in place my laptop was still resolving the public DNS. So I've resorted to editing my hosts file to allow me to connect to the storage over the VPN instead of the internet.

My point in mentioning this is that it could simply be a DNS issue where the private IPs are not being resolved, so I would start looking at that angle and the easiest way to try to rule out connectivity is probably by connecting directly to the private IP of the data lake from your build agent.

If you can't connect to the private IP, then it's network/firewall related, and if you can, then it's DNS related. Hope that helps :)

Hello, It's seems related to this azure-cli issue: Azure/azure-cli#10190

Currently, the creation of a storage container resource (blob, share) seems to use the storage container API which is behind the firewall. Instead, it should use the Resource Manager provider. In the issue mentionned above, I just discover that az cli has a az storage share-rm create in addition to existing az storage share create. I don't know if there is an equivalent for blob, and if this exists in the azure rest API or in terraform :)

@boillodmanuel 's comment above nails it.
I am facing problem #8922 (which is sadly and imho incorrectly closed, can s/o please reopen it? It is a severe problem..) just like @ahmddp who also comes to the same conclusion as @boillodmanuel above.

Enabling public access to the storage account enables file share creation via TF which confirms the above analysis however that's for sure not an option for a productive share. Looks like I need to comment the share resource for now to enable further apply runs (or better refreshes).

In comments such as #17341 (comment), Tom references a ticket in the backlog on the Azure side to fix this.

Is there any public visibility of this ticket? It would let us put pressure on Azure via our account rep to fix this.

Thank you!

It would let us put pressure on Azure via our account rep to fix this.

Point the account rep at this thread.
Then send them this link which shows ~133 issues currently blocked by MS upstream.

Reporting in... this is still an issue. Blob containers created with Terraform are not accessible thru the portal.

image

Code I am using to make a storage account, followed by a container inside of it:

resource "azurerm_storage_account" "tfstate" {
  name                     = var.storageaccountname
  resource_group_name      = azurerm_resource_group.rg.name
  location                 = azurerm_resource_group.rg.location
  account_tier             = "Standard"
  account_replication_type = "LRS"
}

resource "azurerm_storage_container" "deploy" {
  name                  = "deploy"
  storage_account_name  = azurerm_storage_account.tfstate.name
  container_access_type = "blob"
}

Hi @ryanberger-az, can you try to add a private endpoint? Details are in my previous comment from Jan 12:
#2977 (comment)

@tspearconquest Thank you for the suggestion, but that workaround will not meet the requirements for which I am developing this solution. This will be deployed into our customers cloud tenants, and their networking teams are going to NOT be okay with private endpoints being set up for the sake of doing this. What is weird is that I tried another route of doing this, and used an ARM template to be deployed within my terraform file and I still get the same results on the containers it's making. Sort of at a loss because I guess I thought this may get around it. I guess since it's using the azurerm provider, it's still facing the same issues though.

resource "azurerm_resource_group_template_deployment" "deploy" {
  deployment_mode     = "Incremental"
  resource_group_name = azurerm_resource_group.rg.name
  name                = "deploy-with-arm"
  depends_on = [
    azurerm_storage_account.tfstate
  ]
  parameters_content = jsonencode({
    "location" = {
      value = "${var.resource_group_location}"
    }
    "storageAccountName" = {
      value = "${azurerm_storage_account.tfstate.name}"
    }
  })
  template_content = <<TEMPLATE
  {
    "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "storageAccountName": {
            "type": "string"
        },
        "location": {
            "type": "string"
        }
    },
    "resources": [
        {
            "name": "[parameters('storageAccountName')]",
            "type": "Microsoft.Storage/storageAccounts",
            "apiVersion": "2018-07-01",
            "location": "[parameters('location')]",
            "resources": [
                {
                    "name": "default/deploy",
                    "type": "blobServices/containers",
                    "apiVersion": "2018-07-01",
                    "dependsOn": [
                        "[parameters('storageAccountName')]"
                    ]
                }
            ]
        }
    ]
  }
  TEMPLATE

@tspearconquest Thank you for the suggestion, but that workaround will not meet the requirements for which I am developing this solution. This will be deployed into our customers cloud tenants, and their networking teams are going to NOT be okay with private endpoints being set up for the sake of doing this. What is weird is that I tried another route of doing this, and used an ARM template to be deployed within my terraform file and I still get the same results on the containers it's making. Sort of at a loss because I guess I thought this may get around it. I guess since it's using the azurerm provider, it's still facing the same issues though.

Understood. Yes, that seems to be the case. I noticed your comment a while back about the provider talking with an API that gets firewalled off by creating network rules.

The terraform provider seems to be behind on this, so while I have heard people say that they normally don't recommend it, another option that you could try is to setup a local exec provisioner with a null resource that calls the Azure CLI to take care of creating resources after the firewall is enabled. Since the CLI has a way to create it using the Resource Manager API, it should work, though I have not tried it myself.

@tspearconquest Haha, great minds! I edited out my comment where I was talking about going to go down that route of using the local-exec provider to run Azure-CLI commands to build this stuff out. The issue here is that it looks like you need to run an az login before you can actually run the commands.

I may just write an azure CLI script to have our customer run to create the storage account/container for us to use to enable remote state for the bigger terraform deployment we'll be rolling into their environments. The whole reason for this was using terraform to build a storage account/container to be used in a subsequent terraform deployment for the remote state. I wanted to build out the entire deployment with terraform, but in this state, it is not possible.

I appreciate all of your effort onto this subject. I hope that the provider can be resolved soon and that Microsoft wakes-up to this being a real issue.

Happy to help. It certainly is! az login with a service principal or user assigned managed identity would be the way I'd go whether using a script or terraform. ;)

On aspect that seems to be unaddressed is the 'preview' of the nfsv3_enabled option.
That requires the network default action be 'deny' (aka use network_rules) and for that to work, the network_rules must be in the azurerm_storage_account block.
With those two requirements, you are painted into the same corner with a little more pain if you are trying to use a private_endpoint.

@tombuildsstuff @magodo This issue is here for 3.5 years and I cannot see the end of it.
It is blocking huge amount of work for storage accounts that are forced to use a firewall.

Is it possible to make any "temporary" solution until MSFT will implement something from their side?
Bringing something to terraform provider takes one week (until next release is there), and waiting for MSFT already takes much more :)

As others have said, Terraform uses private urls for management of the file share. In our cases DNS resolving of these endpoints was not working correctly. You can test this by lookingup the url using terraform console and then investigate the resource. Use nslookup to determine if you can resolve the url. if not there are several options. For example you could add them to your /etc/hosts file. This solved it our case. Another option is to add the private link to a private dns zon and forward you dns to this private zone.

https://learn.microsoft.com/en-us/azure/dns/dns-private-resolver-overview can also help if you're using private endpoints. You may need this to do the forwarding mentioned by @RobertFloor

If someone need to work around this issue for a storage account of type "FileStorage", e.g. for a NFS-Share, this example code worked for us (based on the previous replies with deployment templates):

resource "azurerm_storage_account" "example-nfs" {
  name                      = "examplenfs"
  resource_group_name       = azurerm_resource_group.example.name
  location                  = azurerm_resource_group.example.location
  account_tier              = "Premium"
  account_kind              = "FileStorage"
  account_replication_type  = "LRS"
  enable_https_traffic_only = false

  network_rules {
    default_action             = "Deny"
    # ip_rules                   = ["127.0.0.1/24"]
    virtual_network_subnet_ids = [azurerm_subnet.example_subnet_1]
    bypass                     = ["AzureServices"]
  }
}

# NOTE Normally, we will do the following azurerm_storage_share.
#  Due to https://github.com/hashicorp/terraform-provider-azurerm/issues/2977
#  this isn't possible right now. So we working around with an ARM template, see
#  post https://github.com/hashicorp/terraform-provider-azurerm/issues/2977#issuecomment-875693407
# resource "azurerm_storage_share" "example-nfs_fileshare" {
#   name                 = "example"
#   storage_account_name = azurerm_storage_account.example-nfs.name
#   quota                = 100
#   enabled_protocol     = "NFS"
# }
resource "azurerm_resource_group_template_deployment" "example-nfs_fileshare" {
  name                = "${azurerm_storage_account.example-nfs.name}-fileshare-example"
  resource_group_name = azurerm_resource_group.example.name
  deployment_mode     = "Incremental"

  parameters_content = jsonencode({
    "storage_account_name" = {
      value = azurerm_storage_account.example-nfs.name
    }
    "fileshare_name" = {
      value ="example"
    }
  })
  template_content = <<TEMPLATE
{
  "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "storage_account_name": {
      "type": "string"
    },
    "fileshare_name": {
      "type": "string"
    }
  },
  "variables": {},
  "resources": [
    {
      "type": "Microsoft.Storage/storageAccounts/fileServices/shares",
      "name": "[concat(parameters('storage_account_name'), '/default/', parameters('fileshare_name'))]",
      "apiVersion": "2021-02-01",
      "properties": {
        "shareQuota": 100,
        "enabledProtocols": "NFS"
      }
    }
  ]
}
TEMPLATE

  depends_on = [azurerm_storage_account.example-nfs]
}

Hi I think I am also facing the same issue .
In my case , I had to import a NFS enabled storage account with ip filters . When I am running terraform plan after the import from my local development environment it is working fine . But when I am running plan from Azure pipelines I am getting the error .

retrieving Container "" (Account "" / Resource Group "****"): containers.Client#GetProperties: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationFailure" Message="This request is not authorized to perform this operation.\n"

I have added the agent's public IP to the firewall rules of the storage account , but it still does not work .

Please let me know , if anyone has found a solution to make this working from the pipeline

This issue still exists in the latest provider (3.51.0). You cannot manage container blob files if you set the storage account network_rules.default_action = Deny. I've tried setting the public IP to my runners public IP as well as the Vnet/Subnet integration with no luck. The provider must be attempting to view the blobs via public endpoint only.

(https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/storage_account#default_action)

resource "azurerm_storage_account" "example" {
name = "storageaccountname"
resource_group_name = azurerm_resource_group.example.name

location = azurerm_resource_group.example.location
account_tier = "Standard"
account_replication_type = "LRS"

network_rules {
default_action = "Deny"
ip_rules = ["100.0.0.1"]
virtual_network_subnet_ids = [azurerm_subnet.example.id]
}

But I just wanted to understand , why this same config is successfully applied from local development , but fails on the pipeline , when in both cases there is a public ip attached and both the public IPs have been whitelisted .

This doesnot seem to be a terraform problem , In that case it wouldnot have worked locally

@abindg Do you use self-hosted Azure DevOps Agents? If yes, where are the agents located? On-premises or Azure? If in Azure, are they hosted in the same Azure region as the storage account?

Eventually we found the reason why. So, the thing is that Storage Accounts have a feature, that all requests to them are routed not via Internet, but via internal Microsoft networks. This only happens when a requesting agent and the Storage Account are in the same region.

In our case we are using Microsoft-hosted agents in our ci. The CI agents are given to us in the same region as the region of Azure DevOps organisation. That is why it happened that both the agents and our Storage Accounts live in the same region.

It seems to be a "feature" from Microsoft, when they want to route requests for Storage Accounts in the fastest way, because it is assumed that there could be a heavy load.

@abindg IIRC, when you are running in the pipeline, the agent will not use its public ip for the outgoing requests due to some reason that I'm not so sure.

@robertbrandso the agents on hosted on azure . I will check the region . Probably it differs .

So if I understand correctly , then for Microsoft internal based routing from agent to storage account they need to be in the same region.

Is this understanding correct

@abindg Probably the reverse way, that if they are in the same region, then it will use some internla/private routing instead (I might be lying).

When they are in the same region, then the traffic is NOT routed via internet and when they are in the same region, then it is. The Storage Account Network Rules applies only to traffic routed via the internet, so for same-region requests, the fix is to use a private endpoint. You need a private endpoint in the storage account. See this comment for more details.

You can also refer here for more about connecting devops to a storage account private endpoint

When they are in the same region, then the traffic is NOT routed via internet and when they are in the same region, then it is. The Storage Account Network Rules applies only to traffic routed via the internet, so for same-region requests, the fix is to use a private endpoint. You need a private endpoint in the storage account. See this comment for more details.

You can also refer here for more about connecting devops to a storage account private endpoint

Thanks . I will check this .

@tspearconquest, @abindg Check out this: #12411 (comment). I didn't mean the request goes through the private Azure backbone network (as PE), but is still through the public internet, just the source ip address is not the public ip of the agent, but a private ip.

That's also our observation.
When fetching the public IP address of the Azure hosted agent (by issuing a request to https://ifconfig.me/ip) and whitelisting this on the Storage Account, it only has effect when both are in a different region.
Could there be any way to fetch the IP address of the Azure hosted agent, which is used 'internally'?

@mvervoort, it wouldn't have any effect if you did, because you can't add RFC1918 adresses to your storage account firewall.

Thanks guys. I didn't realize my runner being used was in the same region as the storage account, hence the problem. Once I assigned a different region runner to process the job and added that public IP to the new storage account ip rules, it was able to provision the account and list/add the container blobs.

Easiest solution is to have runner/agents in multiple regions, allowing the code to reference another regions runner if you're locking down the storage account being managed by the Terraform deployment.

thanks everyone . I got my pipeline agents to read the containers now .

As my agent and the storage account were in the same region , so I added the subnet of my pipeline hosted agents to the network rule of the storage account , first manually and then brought it under my terraform state by including it in my terraform manifest .

It works fine now .

Thanks again.

For those of us using Terraform Cloud to manage the runs, this issue is quite infuriating. Seeing I don't understand the private network stuff well enough (some official Terraform documentation on this would be amazing), the workaround I'm using is to use Azure CLI to turn off the network rules just before the TF runs in the Pipeline then turn it back on again afterwards. Not ideal however as TF is no longer fully managing state.

I doubt even adding the current agent to the firewall rules WITHIN the Terraform provider would work as it would try and get state before doing anything and fail.

I'm having this problem/bug

For those of us using Terraform Cloud to manage the runs, this issue is quite infuriating. Seeing I don't understand the private network stuff well enough (some official Terraform documentation on this would be amazing), the workaround I'm using is to use Azure CLI to turn off the network rules just before the TF runs in the Pipeline then turn it back on again afterwards. Not ideal however as TF is no longer fully managing state.

I doubt even adding the current agent to the firewall rules WITHIN the Terraform provider would work as it would try and get state before doing anything and fail.

Yes, this is what I am, more or less, doing too. ๐Ÿ‘
It's not ideal but it's a good workaround.

I also have fully private infrastructure.
In my pipeline I use the azure cli before running terraform in order to apply a network rule that allows the runner IP address to talk to the storage account.

After terraform finishes its job I run the azcli again to drop the network rule created before.

Another easy solution is to fetch the ip address of the agent in pipeline and add it to the storage account firewall. This will allow us to add containers etc to storage account, since the agent ip will be whitelisted.

`

storage account defined in .tf file

    resource "azurerm_storage_account" "storage" {
    name                          = var.storageaccount
    resource_group_name           = var.resourcegroup
    location                      = var.environment_location
    account_tier                  = "Standard"
    account_replication_type      = "LRS"
    public_network_access_enabled = true
    enable_https_traffic_only     = true
  
    network_rules {
      default_action = "Deny"
      virtual_network_subnet_ids  = [data.azurerm_subnet.appservice_subnet.id]
      ip_rules                    = [var.ip_address]  # add the variable to ip rules for whitelisting
    }
  }

'

.yml file stage

 - stage: DeployDEV
        displayName: 'Deploy to DEV'
        condition: succeeded()
        dependsOn: InstallTerraform
        variables:
        - group: 'Dev-Grp'
        - name: TF_VAR_ip_address  # here we have to prefix the variable 'ip_address' with TF_VAR_ 
          value: '$(ip_address)'
        jobs:
          - job: PlanAndApply
            displayName: Init, Plan and Apply
            continueOnError: false
            steps:
        
    # fetch the agent pipeline ip address and set the value of variable ip_address defined in .tfvars file
        - task: AzureCLI@2
          displayName: Set the IP Adress Variable
          inputs:
            azureSubscription: '$(serviceconnection)'
            scriptType: pscore
            scriptLocation: inlineScript
            addSpnToEnvironment: true
            inlineScript: |
              $ip = (((invoke-webrequest -Uri 'http://checkip.amazonaws.com/').Content | % {[char]$_}) -join "").TrimEnd()
              Write-Host "##vso[task.setvariable variable=ip_address]$ip"

    # replace token task will replace the value of all variables in .tfvar file using variable group.(No need to add ip_address in variable group, but it needs to be defined in .tfvar file
    # i.e. ip_address ="$(ip_address)" )
        - task: qetza.replacetokens.replacetokens-task.replacetokens@5
          displayName: Replace values
          inputs:
            targetFiles: |
              *.tfvars
            encoding: 'auto'
            tokenPattern: 'azpipelines'
            writeBOM: true
            actionOnMissing: 'warn'
            keepToken: false
            actionOnNoFiles: 'fail'
            enableTransforms: false
            enableRecursion: false
            useLegacyPattern: false
            enableTelemetry: true

        - task: TerraformTaskV4@4
          displayName: Init Terraform
          inputs:
            provider: 'azurerm'
            command: 'init'
            backendServiceArm : '$(serviceconnection)'
            backendAzureRmResourceGroupName: '$(tfstate_resourcegroup)'
            backendAzureRmStorageAccountName: '$(tfstate_storageaccount)' 
            backendAzureRmContainerName: '$(tfstate_container)'
            backendAzureRmKey: '$(tfstate_key)'
            
        - task: TerraformTaskV4@4
          displayName: Plan
          inputs:
            provider: 'azurerm'
            command: 'plan'
            backendServiceArm: '$(serviceconnection)' 
            environmentServiceNameAzureRM : '$(serviceconnection)'
            backendAzureRmResourceGroupName: '$(tfstate_resourcegroup)'
            backendAzureRmStorageAccountName: '$(tfstate_storageaccount)' 
            backendAzureRmContainerName: '$(tfstate_container)'
            backendAzureRmKey: '$(tfstate_key)'
            commandOptions: '-var-file="terraform.tfvars" -out=plan.out'
            publishPlanResults: "plan.out"

        - task: TerraformCLI@0
          displayName: Apply
          inputs:
            command: 'apply'
            backendServiceArm: '$(serviceconnection)'
            backendAzureRmResourceGroupName: '$(tfstate_resourcegroup)'
            backendAzureRmStorageAccountName: '$(tfstate_storageaccount)'
            allowTelemetryCollection: true

`

@syedhasanraza that means, that there will always be one IP in the whitelisting list, right? Even after provisioning, it will still be there till the next provisioning

Yes, that's correct,but you can remove it after the storage is provisioned.

@syedhasanraza no, we can't, as it will then break refresh of the containers ๐Ÿ˜•

Actually, this limitation forced us to use AzAPI provider for containers. We use Azure DevOps and MS-hosted agents, so we can whitelist an agent IP, but only for provisioning (only in the same job). All whitelisting "holes" are removed immediately in the same job or stage.

Since it's, by far, the most voted issue and a very old one. I think you should close this as "will not be fixed" AND add a warning note on azurerm_storage_share resource about this "know upstream issue", with a link to this bug.

494206 commented

If I understand this and the other related threads, successfully managing private endpoints on storage accounts with Terraform is currently only possible:

  • in certain specific use cases
  • or with "hacky" (insecure) workarounds
  • or if a private endpoint and private DNS zone are created for every storage account endpoint type

Seconding what @bacatta said -- there really should be a warning on the documentation page for azurerm_storage_account, about this.

I'd go a little further and have that note state that private endpoints are currently not 'officially' supported for storage accounts by the Azurerm provider. It would be a stretch to argue otherwise at the moment, IMHO.

magodo commented

@494206 I think you confused private endpoint vs network rule (firewall). The original 403 error is from network rule. For how to setup the private endpoint for the storage account and its sub resources, you can reference this.

494206 commented

@magodo I understand how private endpoints work, but your criticism is fair - the issue isn't just the private endpoints.

The storage account firewall has a default "allow" rule when each storage account is deployed. This allows Internet connection attempts to any of the built-in public service endpoints on storage accounts. (~6 endpoints per storage account)

If I understand this and the other threads, "azurerm" will fail if a "Deny" rule is set, unless there's a private endpoint it can reach for each of the services per storage account. (~6 private endpoints per storage account?)

It's time to leave a note in the documentation -- 4 years this issue has been open.

(edit: if there's another workaround that I've missed, would be happy to hear it.)

magodo commented

@494206 I didn't meant to criticise about your comment, just ensure things are clear :)

There is a workaround to use the azapi provider to manage the storage container using pure its management plane API, which won't be impacted by any DNS resolve/firewall issues: #2977 (comment)

Just reading the issue from top to the bottom about storage account configuration and i'm not sure if TF is the right place for this challenge (i would not call it issue) as TF have no possibility to change the way how Azure services works.

It all goes down to how Azure Storage Account works and it's services like blobs and files and how it works when private / service endpoint is enabled on Storage Account.
I agree the MS Docs topics about Storage Account services and configuration are not easy to understand, however i think it is a must to have this knowledge when enabling private endpoint / service endpoint.

Private Endpoint
Service Endpoint Limitations

So basically there are some options available:

I just want to share my experience and this is how we are doing that and we are pretty happy with those options so far.

Happy coding!

494206 commented

There is a workaround to use the azapi provider to manage the storage container using pure its management plane API, which won't be impacted by any DNS resolve/firewall issues: #2977 (comment)

This should be added as a note on the azurerm_storage_account page, as the workaround is "don't use azurerm".

Like to add a note from the field, as I have encountered this kind of issue a lot myself in the last week.
This is a wider issue I would describe as:

  • Any azurerm provider that relies on ARM resource endpoints i.e. the "data plane", can (sorta obviously) be broken by inability of the terraform "agent" to access the data plane (be it your TFC cloud agent, or DevOps cloud agent, or self hosted agent and misconfigured private network / privatelink DNS
  • Providers like azurerm_key_vault_secret , azurerm_storage_container , azurerm_storage_data_lake_gen2_filesystem, azure_synapse_role_assignment - rely on the resource API endpoint
  • Reasons for lack of access to the resource API include - privatelink blocked by private network firewall - blocked by resource network_acls rules or firewall rules - misconfigured on-premises privatelink zone DNS fowarding - missing privatelink private DNS zone A record - maybe missing Key Vault access policy(?)
  • Terraform state refresh can be blocked i.e. before apply, for existing resources where there is no longer data plane access, due to reasons
  • Terraform apply can be broken during initial apply i.e. during resource creation, due to reasons
  • In some scenarios where network configuration combines with terraform plan that does not understand dependencies on privatelink DNS record for data plane access - a depends_on clause may be helpful. Sometimes it is not so easy due to dependency loop.

Is there anywhere azurerm providers document their dependency on the data plane API instead of ARM management plane API - as sometimes it's not obvious (e.g. azurerm_storage_data_lake_gen2_filesystem)

This issue was fixed for me with the latest AzureRm provider version I used version 3.70.0

This issue was fixed for me with the latest AzureRm provider version I used version 3.70.0

Was it? I'm trying to create a container after my storage account has been created with a private link and it won't allow me. I'm on 3.73.0 too

This issue was fixed for me with the latest AzureRm provider version I used version 3.70.0

Was it? I'm trying to create a container after my storage account has been created with a private link and it won't allow me. I'm on 3.73.0 too

If you are running from the pipeline make sure you have added pipeline IP in network restriction, Also add 'Storage Blob Data Contributor ' to your current objected (data.azurerm_client_config.current.object_id)

This issue was fixed for me with the latest AzureRm provider version I used version 3.70.0

Was it? I'm trying to create a container after my storage account has been created with a private link and it won't allow me. I'm on 3.73.0 too

If you are running from the pipeline make sure you have added pipeline IP in network restriction, Also add 'Storage Blob Data Contributor ' to your current objected (data.azurerm_client_config.current.object_id)

I understand that and totally appreciate that option would work. It's probably that I'm coming from a different angle in that if you deploy via Bicep you don't need to open up any IPs as this is all done via the API.

Obviously Bicep and Terraform are two different products with different ways of working, so I'll just have to adjust accordingly

rijulg commented

I ran into this problem today as well, and did not want to go via the template route. As a result I wrote the following using the azapi provider

# main.tf
resource "random_uuid" "acl" {}

# Can't use the terraform azurerm provider because that uses the storage api directly
# and since the storage account may (most probably) has public access disabled that API fails
# Documentation for this is available at
# https://learn.microsoft.com/en-us/azure/templates/microsoft.storage/2022-09-01/storageaccounts/fileservices/shares?pivots=deployment-language-terraform
resource "azapi_resource" "this" {
  type      = "Microsoft.Storage/storageAccounts/fileServices/shares@2022-09-01"
  name      = var.name
  parent_id = "${var.storage_account_id}/fileServices/default"
  body = jsonencode({
    properties = {
      accessTier       = var.access_tier
      enabledProtocols = var.enabled_protocol
      shareQuota       = var.claims_storage_quota_gb
      signedIdentifiers = [
        {
          accessPolicy = {
            permission = var.permissions
          }
          id = random_uuid.acl.result
        }
      ]
    }
  })
  response_export_values = ["*"]
}
# vars.tf
variable "name" { type = string }
variable "storage_account_id" { type = string }
variable "access_tier" { type = string }
variable "enabled_protocol" { type = string }
variable "claims_storage_quota_gb" { type = number }
variable "permissions" { type = string }

this works just fine with the service principal

We are facing the same issue and are really surprised that there is until now no good solution existing for that kind of issues.

We have the Azure Policy "Storage Accounts should disable public network access" enabled with "deny" and so it's even not possible atm to allow public/restricted network access to the storage account for deploying the containers.

I ran into this awhile back and would manually create the containers. I finally figured out the simplified AzAPI code and am posting it here. This will create a storage account that disables public access and enables NFSv3, then create a container in that account.

resource "azurerm_storage_account" "group_blob_storage" {
  name                      = "example_storage_account"
  resource_group_name       = local.app_rg_name
  location                  = local.location
  account_kind              = "StorageV2"
  account_tier              = "Standard"
  access_tier               = "Hot"
  account_replication_type  = "LRS"
  enable_https_traffic_only = true
  is_hns_enabled            = true
  nfsv3_enabled             = true
  min_tls_version           = "TLS1_2"
  allow_blob_public_access  = false
  tags                      = local.default_tags
  lifecycle {
    ignore_changes = [
      tags["CreationDate"],
    ]
  }
  network_rules {
    default_action = "Deny"
  }
}

resource "azapi_resource" "group_blob_containers" {
  type      = "Microsoft.Storage/storageAccounts/blobServices/containers@2022-09-01"
  name      = "mycontainer"
  parent_id = "${azurerm_storage_account.group_blob_storage.id}/blobServices/default"
  body = jsonencode({
    properties = {
      defaultEncryptionScope      = "$account-encryption-key"
      denyEncryptionScopeOverride = false
      enableNfsV3AllSquash        = false
      enableNfsV3RootSquash       = false
      metadata                    = {}
      publicAccess                = "None"
    }
  })
  depends_on = [
    azurerm_storage_account.group_blob_storage
  ]
}

You can change the json-encoded properties as needed.

I can confirm that it very much looks like azapi solves the similar problem we have; i.e. how to have Terraform add containers to storage accounts that do not have public internet ingress for the data plane.

Is it possible for this provider to rework the resource to use the resource management API for this operation? The RM API is internet-accessible, which means we don't have to do anything with network rules regarding where terraform apply is executing.

There is another reason for using AZAPI that is related to this issue. It surfaces when you disable storage account Shared Key authentication, as per "well architected framework" guidance:

In order to make this work with Terraform, you need to add storage_use_azuread = true to your provider block, i.e. something like this:

provider "azurerm" {
  features {}
  storage_use_azuread        = true
}

This changes the behaviour of Terraform so that instead of fetching the shared keys and using those, it uses EntraID/AzureAD permissions of the principal running Terraform.

Then, if you try and create a container using azurerm_storage_container it will fail, but if you use the AZAPI provider, it works. Similar reasons to the firewall problem noted in this issue.

Obviously, if you want to do any data plane operations, you would need to set IAM appropriately, however if you just want to create containers and set permissions on them, AZAPI works fine.

The GitHub action in this repository illustrates the behaviour of AZAPI vs AzureRM for this use case.

It's worth noting there are some limitations when disabling shared key authentication, e.g. when using the table & files API as per the AzureRM provider documentation, however this approach works well for things like Terraform state (or any blob workload), and is useful where alignment to WAF principles is a requirement.

Creating container using azurerm provider works just fine when public access to storage is disabled and private endpoint is setup correctly. This provider uses data plane operations to do so (i.e. blob.core.windows.net / dfs.core.windows.net API calls).

Most of the corporate setups are

  • use proxy for internet access (proxy do the DNS resolving and most of the time it can resolve only public IPs, not internal) -> only control plane operations works
  • use direct access to access internal resources (the deployment agent do the DNS resolving) it can resolve internal (privatelink) addresses and access private endpoints (NO_PROXY=blob.core.windows.net,dfs.core.windows.net environment variable must be set) and direct access is ment only for communication with internal resources -> not possible to reach internet directly.

So what conserns me more than creating container is creating the azurerm_storage_account resource which does those data plane operations on its creation. This means when I have this NO_PROXY variable set the deployment agent will resolve the data plane API call to public IP (because private endpoint does not exist yet) and goes directly via internal network which is not ment for internet access -> will get timeout and fail. This is kind of a vicious cycle because private endpoint cannot be created before the storage account.

There are two cures for this problem.

  • one mentioned -> use azapi to create the storage account because azapi does only control plane operations
  • on deployment agents install local proxy server which will do the following
    • when domain resolves to public IP forward it to corporate proxy
    • when doman resolves to private IP go directly (don't use proxy)

Btw exact same problem are having other reources as well, e.g. key vault.

This has been a nightmare, due to static website properties it tries to read.
If public network access is disabled, you're done. Good luck.

It will only work if you get the setup right during creation, if there is anything wrong, or a DNS resolution is a mismatch it is not possible to fix without some manual trickery.

It is also not possible to have it created with dynamic IPs as this requires that the private endpoint gets created before the A record and that creates the chicken and egg scenario.

Whilst the issue with data plane is not resolved, can't a flag be set somewhere, anywhere, to forcefully discard anything related with static website properties? If I don't need it, won't use it, I would like to be allowed to disregard it. Since this would be an opt-in option, it wouldn't introduce any breaking change and would bring peace to this very annoying issue.

I'd say that to make a change like this not be an ugly temporary patch, you could introduce a concept that applies to all resources of the provider, such as something like "global_options", that could be an open text field for situations like this.

Edit:
It just came to me, I could simply use:

lifecycle {
    ignore_changes  = [static_website]
  }

๐Ÿคฏ

Except, it doesn't help. Still tries to retrieve static website properties. But it could be a solution...
@tombuildsstuff I saw your comment here (#20257 (comment)), shouldn't something like the above do exactly what it was being asked, albeit explicitly? I am surprised I haven't thought of that myself before and also that this doesn't seem to matter for the provider. If ignore_changes is analogous to not track you mentioned, this would make it very explicit, for whoever wants to do so and actually fits terraform's resource model.

Local development indeed is a nightmare.

Scenario:

  • Storage Account connected to Private Endpoint
  • Private endpoint for blobs connected to private dns
  • Connected to VPN on local machine

And we have an error:

Error: checking for existing Container "test-container" (Account "Account \"accountname\" (IsEdgeZone false / ZoneName \"\" / Subdomain Type \"blob\" / DomainSuffix \"core.windows.net\")"): executing request: unexpected status 403 (403 This request is not authorized to perform this operation.) with AuthorizationFailure: This request is not authorized to perform this operation.
โ”‚ RequestId:e0f6fe03-801e-0015-5776-9c1a69000000
โ”‚ Time:2024-05-02T09:50:15.9741260Z
โ”‚
โ”‚   with azurerm_storage_container.container[0],
โ”‚   on main.tf line 49, in resource "azurerm_storage_container" "container":
โ”‚   49: resource "azurerm_storage_container" "container" {

Any idea when it can be fixed? like @guderkar said, that setup is kinda common in the industry

I once tested with Azure VPN Gateway + Client and I remember that there are two things essential

  • When connected to VPN make sure you are using DNS servers from internal network
  • VPN must advertise to the client the private networks where the resources with private endpoint are deployed

https://learn.microsoft.com/en-us/azure/vpn-gateway/azure-vpn-client-optional-configurations

Whilst we may look to split the Data Plane functionality out to a separate resource in the future, unfortunately doing so breaks some scenarios (for example, guaranteeing that there's no static website configured, which would need to be done within the main azurerm_storage_account resource).

Originally posted by @tombuildsstuff in #26542 (comment)

In near future, we may see an 4.0 which should allow to break things, right?

guaranteeing that there's no static website configured

It's sometime that could be unmanaged azurerm_storage_account and having a dedicated resource like azurerm_storage_account_static_website_configuration?

Similar to https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_bucket_website_configuration

I had a similar issue with creating file shares in a storage account that has public access disabled. The Azure Portal allowed me to create the file share through the public internet, so I knew it was possible to do so through the Azure API. The public access restriction only applies to accessing the data in the share, not creating the share.

Anyways, to workaround this I created the storage account with the AzureRM provider as normal, with public access disabled. Then I use the AzApi provider to circumvent the bug/limitation in the AzureRM provider and hit the AzApi directly to manage the resource:

resource "azapi_resource" "myshare" {
  type      = "Microsoft.Storage/storageAccounts/fileServices/shares@2023-05-01"
  name      = "myshare"
  parent_id = "${azurerm_storage_account.st-smbshare.id}/fileServices/default"

  body = {
    properties = {
      accessTier : "Hot",
      shareQuota : 1024,
      enabledProtocols : "SMB"
    }
  }
}

Took a little bit of tweaking and looking at the exported template, but the key was to get the parent_id right. I imagine for a blob container this would be blobServices instead of fileServices, but the concept should be similar.

Now for blob services, it might not actually be possible to create the container over the public internet with public access disabled, you'll have to confirm that in the portal. But since it was working in the portal for file shares, I knew that it SHOULD work through TF one way or the other.

Note that using this method, state is managed properly. I tested by deleting the share after it was created and TF plan/apply recreated it properly. There are other areas where I've been using the AzAPI provider as an "escape hatch" when something is broken or unsupported in AzureRM provider.