ionos-cloud/terraform-provider-profitbricks

"Rate Limit Exceeded" but only 40 requests issued

wndhydrnt opened this issue · 8 comments

We are getting the error "Rate Limit Exceeded" on various resources on nearly every run of a terraform plan. We currently provision two datacenters, each containing six LANs, some volumes and two servers.

I used mitmproxy to intercept the requests locally to get a better idea of how many requests are issued and it turns out that only 40 requests are sent during a timespan of 40-45 seconds. Looking at the details of a request in the mitmproxy UI, I discovered that the rate limits for a request that just finished prior to a failed request look like this:

X-RateLimit-Remaining:  290
X-RateLimit-Burst:      300
X-RateLimit-Limit:      600

There are still more than enough requests left, but the next request fails with "Rate Limit Exceeded".

Is the provider creating too many requests?

Terraform Version

Terraform v0.10.8
Profitbricks Provider 1.0.0

Affected Resource(s)

All profitbricks_* resources.

Expected Behavior

Profitbricks Provider successfully completes requests during terraform plan.

Actual Behavior

Profitbricks Provider errors at nearly every run. The message is "Rate Limit Exceeded".

Steps to Reproduce

terraform plan

Important Factoids

We are customers of the Nexinto Business Cloud (https://nbc-api.nexinto.com/cloudapi/v4) which is built on Profitbricks.

Hello @wndhydrnt - I cannot explain why the rate limiting headers show sufficient requests remaining before presenting an error. I have not seen that behavior myself.

Would it be possible to get a copy of your Terraform config file being used (scrubbed of any identifying information, of course). I would like to test it locally myself and see if I can replicate the behavior. If I can replicate, then I can attempt to debug further.

Terraform does have a fairly high default for parallel requests. For testing purposes, can you try running without parallel requests with the following comand?

terraform apply -parallelism=1

I'm curious what the results will be without parallel requests. I'm not saying this is the recommended solution, but it will help with debugging.

There is a known issue when creating more than 2 LANs through the API in quick succession or in parallel. Occasionally a LAN will be missing. This issue can be worked around by using depends_on with the LAN resources, however, this should not produce the "Rate Limit Exceeded" error you are experiencing.

Thanks for the quick reply @edevenport!
We are already using terraform plan -parallelism=1 and terraform apply -parallelism=1. This seems to have lowered the chance of the error appearing, but sometimes it still occurs.

I actually asked a question regarding the known issue when creating LANs and got the reply that the issue has been fixed. I tested it and was able to create LAN objects using count. Take a look at the conversation here. Maybe I just did not hit the bug during my test back then?

I'm going to create a test case for you and will add it as another comment when it is finished.

I've created a test case @edevenport.
Please note that some networks are created but not connected to any instance. This is intentional as we want to add more instances after we have finished the initial bootstrap.

# modules/profitbricks-testcase/main.tf

provider "profitbricks" {
  endpoint = "https://nbc-api.nexinto.com/cloudapi/v4"
}

variable "datacenter_name" {
  default = "testcase"
}

variable "server01_access_ips" {
  default = ["123.456.789.1", "123.456.789.2"]
  type = "list"
}

variable "server02_access_ips" {
  default = ["123.456.789.1", "123.456.789.2"]
  type = "list"
}

variable "ssh_public_key" {}

resource "profitbricks_datacenter" "dc" {
  location = "de/fra"
  name     = "${var.datacenter_name}"
}

resource "profitbricks_lan" "external" {
  datacenter_id = "${profitbricks_datacenter.dc.id}"
  name          = "external"
  public        = true
}

resource "profitbricks_lan" "internal01" {
  datacenter_id = "${profitbricks_datacenter.dc.id}"
  name          = "internal01"
  public        = false
}

resource "profitbricks_lan" "internal02" {
  datacenter_id = "${profitbricks_datacenter.dc.id}"
  name          = "internal02"
  public        = false
}

resource "profitbricks_lan" "internal03" {
  datacenter_id = "${profitbricks_datacenter.dc.id}"
  name          = "internal03"
  public        = false
}

resource "profitbricks_lan" "internal04" {
  datacenter_id = "${profitbricks_datacenter.dc.id}"
  name          = "internal04"
  public        = false
}

resource "profitbricks_lan" "internal05" {
  datacenter_id = "${profitbricks_datacenter.dc.id}"
  name          = "internal05"
  public        = false
}

resource "profitbricks_lan" "internal06" {
  datacenter_id = "${profitbricks_datacenter.dc.id}"
  name          = "internal06"
  public        = false
}

resource "profitbricks_server" "server01" {
  availability_zone = "ZONE_1"
  cores             = "8"
  datacenter_id     = "${profitbricks_datacenter.dc.id}"
  name              = "server01"
  ram               = "40960"

  nic {
    dhcp            = true
    firewall_active = true
    lan             = "${profitbricks_lan.external.id}"
  }

  volume {
    disk_type    = "HDD"
    image_name   = "centos:7"
    name         = "server01 rootfs"
    size         = "50"
    ssh_key_path = "${var.ssh_public_key}"
  }
}

resource "profitbricks_nic" "server01-internal02" {
  datacenter_id   = "${profitbricks_datacenter.dc.id}"
  dhcp            = true
  firewall_active = false
  ip              = "10.0.2.31"
  lan             = "${profitbricks_lan.internal02.id}"
  server_id       = "${profitbricks_server.server01.id}"
}

resource "profitbricks_volume" "server01-data" {
  bus           = "VIRTIO"
  datacenter_id = "${profitbricks_datacenter.dc.id}"
  disk_type     = "HDD"
  licence_type  = "OTHER"
  name          = "server01 pv01"
  server_id     = "${profitbricks_server.server01.id}"
  size          = "200"
}

resource "profitbricks_volume" "server01-slow_storage" {
  bus           = "VIRTIO"
  datacenter_id = "${profitbricks_datacenter.dc.id}"
  disk_type     = "HDD"
  licence_type  = "OTHER"
  name          = "server01 pv02"
  server_id     = "${profitbricks_server.server01.id}"
  size          = "200"
}

resource "profitbricks_firewall" "server01-icmp" {
  datacenter_id    = "${profitbricks_datacenter.dc.id}"
  server_id        = "${profitbricks_server.server01.id}"
  nic_id           = "${profitbricks_server.server01.primary_nic}"
  protocol         = "ICMP"
  name             = "icmp"
  source_ip        = "${var.server01_access_ips[count.index]}"

  count = "${length(var.server01_access_ips)}"
}

resource "profitbricks_firewall" "server01-nrpe" {
  datacenter_id    = "${profitbricks_datacenter.dc.id}"
  server_id        = "${profitbricks_server.server01.id}"
  nic_id           = "${profitbricks_server.server01.primary_nic}"
  protocol         = "TCP"
  name             = "nrpe"
  port_range_start = 5666
  port_range_end   = 5666
  source_ip        = "${var.server01_access_ips[count.index]}"

  count = "${length(var.server01_access_ips)}"
}

resource "profitbricks_firewall" "server01-ssh" {
  datacenter_id    = "${profitbricks_datacenter.dc.id}"
  server_id        = "${profitbricks_server.server01.id}"
  nic_id           = "${profitbricks_server.server01.primary_nic}"
  protocol         = "TCP"
  name             = "ssh"
  port_range_start = 22
  port_range_end   = 22
  source_ip        = "${var.server01_access_ips[count.index]}"

  count = "${length(var.server01_access_ips)}"
}

resource "profitbricks_server" "server02" {
  availability_zone = "ZONE_1"
  cores             = "2"
  datacenter_id     = "${profitbricks_datacenter.dc.id}"
  name              = "server02"
  ram               = "4096"

  nic {
    dhcp            = true
    firewall_active = true
    lan             = "${profitbricks_lan.external.id}"
  }

  volume {
    disk_type    = "HDD"
    image_name   = "centos:7"
    name         = "server02 rootfs"
    size         = "50"
    ssh_key_path = "${var.ssh_public_key}"
  }
}

resource "profitbricks_nic" "server02-internal05" {
  datacenter_id   = "${profitbricks_datacenter.dc.id}"
  dhcp            = true
  firewall_active = false
  ip              = "10.0.5.31"
  lan             = "${profitbricks_lan.internal05.id}"
  server_id       = "${profitbricks_server.server02.id}"
}

resource "profitbricks_volume" "server02-data" {
  bus           = "VIRTIO"
  datacenter_id = "${profitbricks_datacenter.dc.id}"
  disk_type     = "HDD"
  licence_type  = "OTHER"
  name          = "server02 pv01"
  server_id     = "${profitbricks_server.server02.id}"
  size          = "500"
}

resource "profitbricks_firewall" "server02-any" {
  datacenter_id    = "${profitbricks_datacenter.dc.id}"
  server_id        = "${profitbricks_server.server02.id}"
  nic_id           = "${profitbricks_server.server02.primary_nic}"
  protocol         = "ANY"
  name             = "any"
  source_ip        = "${var.server02_access_ips[count.index]}"

  count = "${length(var.server02_access_ips)}"
}

resource "profitbricks_firewall" "server02-icmp" {
  datacenter_id    = "${profitbricks_datacenter.dc.id}"
  server_id        = "${profitbricks_server.server02.id}"
  nic_id           = "${profitbricks_server.server02.primary_nic}"
  protocol         = "ICMP"
  name             = "icmp"
  source_ip        = "${var.server02_access_ips[count.index]}"

  count = "${length(var.server02_access_ips)}"
}

resource "profitbricks_firewall" "server02-nrpe" {
  datacenter_id    = "${profitbricks_datacenter.dc.id}"
  server_id        = "${profitbricks_server.server02.id}"
  nic_id           = "${profitbricks_server.server02.primary_nic}"
  protocol         = "TCP"
  name             = "nrpe"
  port_range_start = 5666
  port_range_end   = 5666
  source_ip        = "${var.server02_access_ips[count.index]}"

  count = "${length(var.server02_access_ips)}"
}

resource "profitbricks_firewall" "server02-ssh" {
  datacenter_id    = "${profitbricks_datacenter.dc.id}"
  server_id        = "${profitbricks_server.server02.id}"
  nic_id           = "${profitbricks_server.server02.primary_nic}"
  protocol         = "TCP"
  name             = "ssh"
  port_range_start = 22
  port_range_end   = 22
  source_ip        = "${var.server02_access_ips[count.index]}"

  count = "${length(var.server02_access_ips)}"
}

@wndhydrnt the issue has been fixed but that is not the problem here.

I tested your TF file against https://api.profitbricks.com/cloudapi/v4/ and It worked just fine

Apply complete! Resources: 29 added, 0 changed, 0 destroyed.

The only difference was that I didn't have 4 IPs available so all 14 firewalls did not have source_ip but that is not relevant. here is the output https://gist.github.com/jasmingacic/fa9f84949b7b83ca36beff7eb16163bd

and this is the modified TF file i used https://gist.github.com/jasmingacic/ced86e287c4b4f8d69508ed240beaf21

Also here is what it looks like in DCD

image

Thanks @jasmingacic for setting this up so quickly.
The screenshot of the DCD looks a lot like our setup. We create a second datacenter that currently does not contain any nodes. I guess this is the reason why I saw 40 requests in mitmproxy.

What happens if you execute terraform plan several times? Do you see any Rate Limit Exceeded errors?
I also checked the logs of the various runs of terraform plan (we are using a Jenkins job). The error often happens when requesting resources like LANs and firewall rules.
I noticed that the error occurs less often on my local workstation. Maybe the Jenkins build server creates too many requests at a time as it has a better connection to the internet? But that's just a theory.

On the existing infrastructure repeated terraform plan doesn't cause any problems.

If you are using Jenkins I would recommend that you make sure that the account is not the same as the account you are using locally. I have noticed that the request rate limit is per tenant rather than per account. Also make sure that you don't have several terraform instances running in parallel.

I don't think that internet connection has anything to do with it, at least no major impact.

@jasmingacic Sadly, Jenkins already uses a different user. terraform plan is also not executed in parallel as Concurrent Builds for the job are disabled. I checked the output of ps on a Jenkins node to be sure.

Using terraform plan -parallelism=1 has eased the situation a bit for us, with only one out of five builds failing due to the Rate Limit errors. The errors still occur frequently when executing terraform plan without setting -parallelism.
I'm pretty sure now that this is an issue of the Profitbricks API. In the end, this provider simply creates GET requests.
Therefore, I'm closing this issue. Thank you to both of you for the fast responses!

Hello again @wndhydrnt - I have tested the config you provided today and it has worked successfully every time. I have tested locally and from a remote server. I also tested without reducing the parallelism and also while performing other API operations (deploying hosts via Vagrant at the same time). I never experienced the "Rate Limit Exceeded" error; it maintained around 298-300.

As @jasmingacic indicated, the rate limits are per contract and not per user. So you can have different users performing API requests and they will all be affected by the same rate limit. It's something to keep in mind.

I just saw you updated this issue while typing this and I am sorry to hear you are still experiencing problems. If I can come up with any other ideas, I will reach out to you. This is very puzzling.