az postgres flexible-server upgrade InternalServerError

Question

az postgres flexible-server upgrade InternalServerError

Opened this issue a year ago · 35 comments

Hello,

I am trying to upgrade a postgres flexible-server from v11 to a newer version put I always get (InternalServerError) An unexpected error occured while processing the request. Tracking ID: '')

More specifically:

I have created a v11 postgres flexible-server
Activated the PG_BUFFERCACHE , PG_STAT_STATEMENTS, PLPGSQL and POSTGIS extensions
Migrated from a v11 postgres single-server
Made attempts to upgrade the flexible server from 11 to 12, 13, 14 and 15 from the portal and from the az-cli but all attempts failed with the InternalServerError error.

mrpotato3 commented 7 months ago

+1

Answer 1 · 2023-09-21T15:31:06.000Z

same issue here, support has been terribly bad at handling this case for us

Answer 2 · 2023-09-25T07:44:59.000Z

Hello @charalamm I have the same issue with flexible-server from v11 to v14. Azure support doesn't help me

Answer 3 · 2023-11-02T08:51:42.000Z

@charalamm have you managed to get some info about this?

Answer 4 · 2023-11-03T14:04:25.000Z

@fsismondi I have talked with support and they said it is because of a bug on their end. They said they will fix it for the next release around mid November

Answer 5 · 2023-12-19T13:33:16.000Z

hello, any update on this ?

Answer 6 · 2024-03-05T19:58:54.000Z

Having the same issues upgrading from R12 to R15. Takes ages to resolve.

Answer 7 · 2024-04-06T16:20:23.000Z

@fsismondi I have talked with support and they said it is because of a bug on their end. They said they will fix it for the next release around mid November

Did they say which year?

It's now April 2024, I just tried to upgrade 14 to 16:

{
  "code": "InternalServerError",
  "message": "An unexpected error occured while processing the request. Tracking ID: '3b54f416-f0a2-40a3-83ad-e9aa736f08ed'"
}

Answer 8 · 2024-04-06T16:38:17.000Z

I had luck to do upgrade today, after 38 days with support ... Incredible service.

Answer 9 · 2024-04-08T08:00:16.000Z

We are in the situation where we need to upgrade 10+ servers to v14. Create new ones and backup/restore would mean a lot of man hours, .

Disabling extensions seems not enough for making the procedure work.

Please somebody from @microsoftopensource , maybe @ramnov @rachel-msft @ambrahma can provide some info on what is happening here?
Support has been useless regarding this issue

Answer 10 · 2024-04-08T22:29:20.000Z

@charalamm , @fsismondi we have a known issue of major upgrade failure due to timeouts when the server has large number of databases and/or schemas. We are working on a fixing on priority. Sorry for the inconvenience caused.
Can you raise a support ticket with your servers and share it here and I will personally follow up to make sure to address them ?

Answer 11 · 2024-04-09T06:05:03.000Z

Mine is 2402280050000780 and I would like to know what happened and if it is resolved the way it will not happen again

Answer 12 · 2024-04-09T06:21:33.000Z

Ours is 2312180050002992, this ticket was closed though.
Support was -put plainly- useless.
We are interested in knowing a safe and reproducible procedure we can follow start migrating all our servers.

Answer 13 · 2024-04-09T06:36:04.000Z

Yeah, I'm currently doing manual migrations because of this, and it's a PITA

Answer 14 · 2024-04-09T14:26:33.000Z

Same issue here 👍

Answer 15 · 2024-04-10T09:56:47.000Z

dont have a support ticket but i have 2 postgres instances which I'm unable to dump to 16. any ETA on the fix?

Answer 16 · 2024-04-12T07:48:46.000Z

Apparently they are working on it at the moment. https://ruby.social/@clairegiordano@hachyderm.io/112254606338198662

Answer 17 · 2024-04-12T08:52:54.000Z

Still not fixed after 8 months

Answer 18 · 2024-04-12T11:28:11.000Z

After having contact with Azure Premium Support we were told they will fix this at the end of April. Its a Problem on Microsoft Side 👍

Answer 19 · 2024-04-12T20:28:08.000Z

This is a high level error with different underlying issues. Would request others as well to raise the support ticket to address them.
Add your ASC ticket here if you don't get traction and I will prioritize it.

Answer 20 · 2024-05-06T21:54:29.000Z

I just tried to do an upgrade again and am still receiving the same error. Has there been any updates on the underlying issue?

Answer 21 · 2024-05-06T23:02:25.000Z

We have contacted support and they have made something to our databases that had this issue that allowed upgrading. So I guess it's solvable via support

Answer 22 · 2024-05-18T16:16:16.000Z

Just tested here and it worked.

Answer 23 · 2024-05-23T04:32:48.000Z

We just ran into this tonight trying to upgrade 13 -> 16 and 13 -> 14, trying 13 -> 15 because we really need to fix a bug that was fixed in Postgres v14.

Answer 24 · 2024-06-19T16:42:15.000Z

As a heads-up: I ran into this issue last week while trying to upgrade from Postgres 11 to 13. Azure support told me to, quote:

We request you to try the upgrade operation after following the below steps:
1. Please drop the extension “hypopg” in database level.
2. Then disable the “hypopg” extension in server. To disable follow the below:
Server parameter-->azure.extensions-->hypopg

After performing the above two steps, please try to upgrade the server and let us know if you face any issues.

And this worked fine. I was also able to re-install hypopg afterwards.

Answer 25 · 2024-08-13T12:32:34.000Z

We also encountered this problem. We use the pgrouting extension which cannot always be upgraded. We had to do the following steps for the upgrade:

Drop the pgrouting extension on databases
Perform the database upgrade in Azure Portal
Re-add the extension to the databases

Answer 26 · 2024-08-27T07:27:39.000Z

Same issue here, even when testing with a brand new 12.19 psql database without any extensions. Tried multiple times, upgrade to 13, 14, 15 and 16 all failed.

Answer 27 · 2024-08-27T07:33:11.000Z

I have a new case open for 2 servers (once v14, one v15) who both refuse to update.
They are currently escalating to the product team. We have enabled upgrade logs and get

The source cluster was not shut down cleanly. Failure, exiting

There are no extensions installed

Answer 28 · 2024-08-27T07:44:40.000Z

FYI: we successfully migrated with the Terraform Provider from PG 11 to PG 16 this weekend 😄

Answer 29 · 2024-08-27T09:26:33.000Z

I just tried it using Terraform as well, without success. Below is the full terraform code I used.
Applied it first with create_mode Default and version 12, made the indicated changes and applied again.

Terraform

terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~>3.116.0"
    }
    random = {
      source  = "hashicorp/random"
      version = "~>3.0"
    }
  }
}

provider "azurerm" {
  features {}
}


data "azurerm_resource_group" "rg" {
  name = "rg-wds-dev-weu-001"
}

resource "azurerm_postgresql_flexible_server" "psql_upgrade_test" {
  name                = "psql-wds-upgrade-test-terraform-001"
  resource_group_name = data.azurerm_resource_group.rg.name
  location            = "westeurope"

  backup_retention_days        = 7
  geo_redundant_backup_enabled = false
  create_mode                  = "Default" -> "Update"
  version                      = "12" -> "16"
  storage_mb                   = 32768
  sku_name                     = "B_Standard_B1ms"
  zone                         = 2

  administrator_login    = "thisismyadmin"
  administrator_password = "super-secret-password"

  public_network_access_enabled = true
}

resource "azurerm_postgresql_flexible_server_database" "psqldb_testdatabase" {
  name      = "testdatabase"
  server_id = azurerm_postgresql_flexible_server.psql_upgrade_test.id
  collation = "en_US.utf8"
  charset   = "utf8"
}

resource "azurerm_postgresql_flexible_server_firewall_rule" "psqlfr_azure_services" {
  name             = "Allow-public-azure-service-access"
  server_id        = azurerm_postgresql_flexible_server.psql_upgrade_test.id
  start_ip_address = "0.0.0.0"
  end_ip_address   = "0.0.0.0"
}

Output

╷
│ Error: updating Flexible Server (Subscription: "<redacted>"
│ Resource Group Name: "rg-wds-dev-weu-001"
│ Flexible Server Name: "psql-wds-upgrade-test-terraform-001"): polling after Update: polling failed: the Azure API returned the following error:
│ 
│ Status: "InternalServerError"
│ Code: ""
│ Message: "An unexpected error occured while processing the request. Tracking ID: 'd00608f4-b633-41a2-af69-557cd4ee258c'"
│ Activity Id: ""
│ 
│ ---
│ 
│ API Response:
│ 
│ ----[start]----
│ {"name":"f3a142f9-aacd-4eb2-a172-f84dd899a991","status":"Failed","startTime":"2024-08-27T09:00:24.43Z","error":{"code":"InternalServerError","message":"An unexpected error occured while processing the request. Tracking ID: 'd00608f4-b633-41a2-af69-557cd4ee258c'"}}
│ -----[end]-----
│ 
│ 
│   with azurerm_postgresql_flexible_server.psql_smartlab_api,
│   on main.tf line 23, in resource "azurerm_postgresql_flexible_server" "psql_upgrade_test":
│   23: resource "azurerm_postgresql_flexible_server" "psql_upgrade_test" {
│ 
╵

Answer 30 · 2024-08-27T09:30:28.000Z

terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~>3.0"
    }
  }
}

This is pretty old - we used "~> 3.116.0" - maybe that helps?

Answer 31 · 2024-08-27T11:24:00.000Z

Just tried it with 3.116.0 as well but the issue is the same. I have reached out to Microsoft support as well and they are currently investigating the issue.

Answer 32 · 2024-09-26T13:50:11.000Z

Still getting this error while trying to upgrade PG flexible server via the portal

Answer 33 · 2024-09-26T14:01:37.000Z

We had issue with upgrade recently and doing restart and upgrade next day resolved the issue.

Answer 34 · 2024-10-02T17:42:10.000Z

After a lot of communication with Microsoft Support, they were finally able to upgrade my instance. Here is the feedback I received from them:

-> Initially you experienced an MVU (Maintenance and Version Upgrade) failure due to pending_restart parameter was set to true. this means the server needed a restart before the upgrade could proceed.
-> An engineer restarted the container, allowing you to try the MVU again.
-> During the retry, the MVU failed again due to insufficient disk space in the /tmp directory. This directory didn’t have enough space to handle the upgrade process.
-> Memory Issue: The B1ms SKU (a specific server configuration) has less than 1 GB of memory available for the cluster, which can cause MVU failures if the memory is nearly full.
-> We have addressed this in an upcoming release which removes the dependency for MVU.
-> For the third MVU attempt, our engineer initiated the upgrade from the backend and the upgrade proceeded without issue.

This confirms the behavior that some users are reporting here that simply restarting their instance fixed the problem.
Until they have shipped the new release which removes the dependency on available memory, temporarily upscaling your instance to one with more disk space and/or more memory might fix the problem as well.