lxc/incus

Incorrect arguments passed to instance placement scriptlet on instance move

victoitor opened this issue · 2 comments

When incus move is used to move an instance between projects in a cluster, the arguments used to call the placement scriptlet are incorrect.

I have 3 projects with the following cluster group restrictions.

victoitor@bastion:~$ incus project get intel-12700 restricted.cluster.groups
intel-12700
victoitor@bastion:~$ incus project get amd-5700g restricted.cluster.groups
amd-5700g
victoitor@bastion:~$ incus project get auxiliar restricted.cluster.groups
amd-5700g

And the following cluster groups.

victoitor@bastion:~$ incus cluster group show amd-5700g
description: ""
members:
- amd01
- amd02
- amd03
- amd04
config: {}
name: amd-5700g
victoitor@bastion:~$ incus cluster group show intel-12700
description: ""
members:
- intel01
- intel02
- intel03
config: {}
name: intel-12700

I have a scriptlet with the following part for logging the input.

def instance_placement(request, candidate_members):
    project = get_project( request.project )
    log_error("SCRIPTLET DEBUG Request: ", request, "\nSCRIPTLET DEBUG Candidade members: ", candidate_members, "\nSCRIPTLET DEBUG Project: ", project)

So I create and instance on project auxiliar and use incus move to move it between all possible pairs of projects, I get the following sequence of command and log. So the set of candidate members always includes just one node instead of the full cluster group. Furthermore, sometimes the target project is incorrect, like when moving from amd-5700g to auxiliar, which is quite awkward.

victoitor@bastion:~$ incus move incus-test --project auxiliar --target-project amd-5700g
ERROR  [2024-10-09T14:34:41-03:00] Instance placement scriptlet: SCRIPTLET DEBUG Request: {"architecture": "x86_64", "config": {"cloud-init.vendor-data": "#cloud-config\npackage_update: true\npackage_upgrade: true\npackage_reboot_if_required: true\ntimezone: America/Fortaleza\nusers:\n- gecos: Default pargo user\n  groups: sudo, video, render\n  name: pargo\n  lock_passwd: true\n  sudo: ALL=(ALL) NOPASSWD:ALL\n  shell: /bin/bash\n", "image.architecture": "amd64", "image.description": "Debian bookworm amd64 (20241009_05:24)", "image.os": "Debian", "image.release": "bookworm", "image.serial": "20241009_05:24", "image.type": "squashfs", "image.variant": "default", "limits.cpu": "0-5,8-13", "limits.memory": "24GB", "security.nesting": "true", "user.responsavel": "Incus Test", "volatile.apply_template": "create", "volatile.base_image": "bea0f1696dc17d7a8002d8d0dd408ad51fa89212e24db1593831b0bed583a5a3", "volatile.eth0.hwaddr": "00:16:3e:c0:aa:18"}, "devices": {"eth0": {"name": "eth0", "nictype": "bridged", "parent": "br0", "type": "nic"}, "root": {"path": "/", "pool": "local", "type": "disk"}}, "ephemeral": False, "profiles": ["default"], "restore": "", "stateful": False, "description": "", "name": "incus-test", "source": {"type": "copy", "certificate": "", "alias": "", "fingerprint": "", "properties": {}, "server": "", "secret": "", "protocol": "", "base-image": "", "mode": "", "operation": "", "secrets": {}, "source": "incus-test", "live": False, "instance_only": False, "refresh": False, "project": "auxiliar", "allow_inconsistent": False}, "instance_type": "", "type": "container", "start": False, "reason": "new", "project": "amd-5700g"}
SCRIPTLET DEBUG Candidade members: [{"roles": [], "failure_domain": "default", "description": "", "config": {"user.experimentos.limits.cpu": "0-5,8-13", "user.experimentos.limits.memory": "24GB"}, "groups": ["default", "amd-5700g"], "server_name": "amd02", "url": "https://10.11.16.12:8443", "database": False, "status": "Online", "message": "Fully operational", "architecture": "x86_64"}]
SCRIPTLET DEBUG Project: {"config": {"features.images": "false", "features.profiles": "true", "features.storage.buckets": "true", "features.storage.volumes": "true", "restricted": "true", "restricted.cluster.groups": "amd-5700g", "restricted.cluster.target": "allow", "restricted.containers.nesting": "allow", "restricted.devices.nic": "allow", "restricted.snapshots": "allow", "user.node.limits.cpu": "0-5,8-13", "user.node.limits.cpu.unique": "true", "user.node.limits.memory": "24GB", "user.node.represented": "true", "user.node.represented.unique": "true"}, "description": "Experimentos - máquinas amd-5700g", "name": "amd-5700g", "used_by": []} 
victoitor@bastion:~$ incus move incus-test --project amd-5700g --target-project auxiliar
ERROR  [2024-10-09T14:35:53-03:00] Instance placement scriptlet: SCRIPTLET DEBUG Request: {"architecture": "", "config": {"cloud-init.vendor-data": "#cloud-config\npackage_update: true\npackage_upgrade: true\npackage_reboot_if_required: true\ntimezone: America/Fortaleza\nusers:\n- gecos: Default pargo user\n  groups: sudo, video, render\n  name: pargo\n  lock_passwd: true\n  sudo: ALL=(ALL) NOPASSWD:ALL\n  shell: /bin/bash\n", "image.architecture": "amd64", "image.description": "Debian bookworm amd64 (20241009_05:24)", "image.os": "Debian", "image.release": "bookworm", "image.serial": "20241009_05:24", "image.type": "squashfs", "image.variant": "default", "limits.cpu": "0-5,8-13", "limits.memory": "24GB", "security.nesting": "true", "user.responsavel": "Incus Test", "volatile.apply_template": "create", "volatile.base_image": "bea0f1696dc17d7a8002d8d0dd408ad51fa89212e24db1593831b0bed583a5a3", "volatile.cloud-init.instance-id": "58659f9e-ee14-44dc-9669-4f6857d581d3", "volatile.eth0.hwaddr": "00:16:3e:c0:aa:18", "volatile.idmap.base": "0", "volatile.idmap.next": "[{\"Isuid\":true,\"Isgid\":false,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":1000000000},{\"Isuid\":false,\"Isgid\":true,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":1000000000}]", "volatile.last_state.idmap": "[]", "volatile.uuid": "3777865f-e0ec-4e0f-a4db-88fd54d93623", "volatile.uuid.generation": "3777865f-e0ec-4e0f-a4db-88fd54d93623"}, "devices": {"eth0": {"name": "eth0", "nictype": "bridged", "parent": "br0", "type": "nic"}, "root": {"path": "/", "pool": "local", "type": "disk"}}, "ephemeral": False, "profiles": [], "restore": "", "stateful": False, "description": "", "name": "incus-test", "source": {"type": "", "certificate": "", "alias": "", "fingerprint": "", "properties": {}, "server": "", "secret": "", "protocol": "", "base-image": "", "mode": "", "operation": "", "secrets": {}, "source": "", "live": False, "instance_only": False, "refresh": False, "project": "", "allow_inconsistent": False}, "instance_type": "", "type": "", "start": False, "reason": "relocation", "project": "amd-5700g"}
SCRIPTLET DEBUG Candidade members: [{"roles": [], "failure_domain": "default", "description": "", "config": {"user.experimentos.limits.cpu": "0-5,8-13", "user.experimentos.limits.memory": "24GB"}, "groups": ["default", "amd-5700g"], "server_name": "amd02", "url": "https://10.11.16.12:8443", "database": False, "status": "Online", "message": "Fully operational", "architecture": "x86_64"}]
SCRIPTLET DEBUG Project: {"config": {"features.images": "false", "features.profiles": "true", "features.storage.buckets": "true", "features.storage.volumes": "true", "restricted": "true", "restricted.cluster.groups": "amd-5700g", "restricted.cluster.target": "allow", "restricted.containers.nesting": "allow", "restricted.devices.nic": "allow", "restricted.snapshots": "allow", "user.node.limits.cpu": "0-5,8-13", "user.node.limits.cpu.unique": "true", "user.node.limits.memory": "24GB", "user.node.represented": "true", "user.node.represented.unique": "true"}, "description": "Experimentos - máquinas amd-5700g", "name": "amd-5700g", "used_by": []} 
victoitor@bastion:~$ incus move incus-test --project auxiliar --target-project intel-12700
ERROR  [2024-10-09T14:37:47-03:00] Instance placement scriptlet: SCRIPTLET DEBUG Request: {"architecture": "x86_64", "config": {"cloud-init.vendor-data": "#cloud-config\npackage_update: true\npackage_upgrade: true\npackage_reboot_if_required: true\ntimezone: America/Fortaleza\nusers:\n- gecos: Default pargo user\n  groups: sudo, video, render\n  name: pargo\n  lock_passwd: true\n  sudo: ALL=(ALL) NOPASSWD:ALL\n  shell: /bin/bash\n", "image.architecture": "amd64", "image.description": "Debian bookworm amd64 (20241009_05:24)", "image.os": "Debian", "image.release": "bookworm", "image.serial": "20241009_05:24", "image.type": "squashfs", "image.variant": "default", "limits.cpu": "0-15", "limits.memory": "24GB", "security.nesting": "true", "user.responsavel": "Incus Test", "volatile.apply_template": "copy", "volatile.base_image": "bea0f1696dc17d7a8002d8d0dd408ad51fa89212e24db1593831b0bed583a5a3", "volatile.eth0.hwaddr": "00:16:3e:dc:e3:a7"}, "devices": {"eth0": {"name": "eth0", "nictype": "bridged", "parent": "br0", "type": "nic"}, "root": {"path": "/", "pool": "local", "type": "disk"}}, "ephemeral": False, "profiles": ["default"], "restore": "", "stateful": False, "description": "", "name": "incus-test", "source": {"type": "copy", "certificate": "", "alias": "", "fingerprint": "", "properties": {}, "server": "", "secret": "", "protocol": "", "base-image": "", "mode": "", "operation": "", "secrets": {}, "source": "incus-test", "live": False, "instance_only": False, "refresh": False, "project": "auxiliar", "allow_inconsistent": False}, "instance_type": "", "type": "container", "start": False, "reason": "new", "project": "intel-12700"}
SCRIPTLET DEBUG Candidade members: [{"roles": ["database"], "failure_domain": "default", "description": "", "config": {"user.experimentos.limits.cpu": "0-15", "user.experimentos.limits.memory": "24GB"}, "groups": ["intel-12700"], "server_name": "intel01", "url": "https://10.11.16.31:8443", "database": True, "status": "Online", "message": "Fully operational", "architecture": "x86_64"}]
SCRIPTLET DEBUG Project: {"config": {"features.images": "false", "features.profiles": "true", "features.storage.buckets": "true", "features.storage.volumes": "true", "restricted": "true", "restricted.cluster.groups": "intel-12700", "restricted.cluster.target": "allow", "restricted.containers.nesting": "allow", "restricted.devices.nic": "allow", "restricted.snapshots": "allow", "user.node.limits.cpu": "0-15", "user.node.limits.cpu.unique": "true", "user.node.limits.memory": "24GB", "user.node.represented": "true", "user.node.represented.unique": "true"}, "description": "Experimentos - máquinas intel-12700", "name": "intel-12700", "used_by": []} 
victoitor@bastion:~$ incus move incus-test --project intel-12700 --target-project amd-5700g
ERROR  [2024-10-09T14:39:27-03:00] Instance placement scriptlet: SCRIPTLET DEBUG Request: {"architecture": "x86_64", "config": {"cloud-init.vendor-data": "#cloud-config\npackage_update: true\npackage_upgrade: true\npackage_reboot_if_required: true\ntimezone: America/Fortaleza\nusers:\n- gecos: Default pargo user\n  groups: sudo, video, render\n  name: pargo\n  lock_passwd: true\n  sudo: ALL=(ALL) NOPASSWD:ALL\n  shell: /bin/bash\n", "image.architecture": "amd64", "image.description": "Debian bookworm amd64 (20241009_05:24)", "image.os": "Debian", "image.release": "bookworm", "image.serial": "20241009_05:24", "image.type": "squashfs", "image.variant": "default", "limits.cpu": "0-5,8-13", "limits.memory": "24GB", "security.nesting": "true", "user.responsavel": "Incus Test", "volatile.apply_template": "copy", "volatile.base_image": "bea0f1696dc17d7a8002d8d0dd408ad51fa89212e24db1593831b0bed583a5a3", "volatile.eth0.hwaddr": "00:16:3e:dc:e3:a7"}, "devices": {"eth0": {"name": "eth0", "nictype": "bridged", "parent": "br0", "type": "nic"}, "root": {"path": "/", "pool": "local", "type": "disk"}}, "ephemeral": False, "profiles": ["default"], "restore": "", "stateful": False, "description": "", "name": "incus-test", "source": {"type": "copy", "certificate": "", "alias": "", "fingerprint": "", "properties": {}, "server": "", "secret": "", "protocol": "", "base-image": "", "mode": "", "operation": "", "secrets": {}, "source": "incus-test", "live": False, "instance_only": False, "refresh": False, "project": "intel-12700", "allow_inconsistent": False}, "instance_type": "", "type": "container", "start": False, "reason": "new", "project": "amd-5700g"}
SCRIPTLET DEBUG Candidade members: [{"roles": ["database-leader", "database"], "failure_domain": "default", "description": "", "config": {"user.experimentos.limits.cpu": "0-5,8-13", "user.experimentos.limits.memory": "24GB"}, "groups": ["default", "amd-5700g"], "server_name": "amd01", "url": "https://10.11.16.11:8443", "database": True, "status": "Online", "message": "Fully operational", "architecture": "x86_64"}]
SCRIPTLET DEBUG Project: {"config": {"features.images": "false", "features.profiles": "true", "features.storage.buckets": "true", "features.storage.volumes": "true", "restricted": "true", "restricted.cluster.groups": "amd-5700g", "restricted.cluster.target": "allow", "restricted.containers.nesting": "allow", "restricted.devices.nic": "allow", "restricted.snapshots": "allow", "user.node.limits.cpu": "0-5,8-13", "user.node.limits.cpu.unique": "true", "user.node.limits.memory": "24GB", "user.node.represented": "true", "user.node.represented.unique": "true"}, "description": "Experimentos - máquinas amd-5700g", "name": "amd-5700g", "used_by": []} 
victoitor@bastion:~$ incus move incus-test --project amd-5700g --target-project intel-12700
ERROR  [2024-10-09T14:40:29-03:00] Instance placement scriptlet: SCRIPTLET DEBUG Request: {"architecture": "x86_64", "config": {"cloud-init.vendor-data": "#cloud-config\npackage_update: true\npackage_upgrade: true\npackage_reboot_if_required: true\ntimezone: America/Fortaleza\nusers:\n- gecos: Default pargo user\n  groups: sudo, video, render\n  name: pargo\n  lock_passwd: true\n  sudo: ALL=(ALL) NOPASSWD:ALL\n  shell: /bin/bash\n", "image.architecture": "amd64", "image.description": "Debian bookworm amd64 (20241009_05:24)", "image.os": "Debian", "image.release": "bookworm", "image.serial": "20241009_05:24", "image.type": "squashfs", "image.variant": "default", "limits.cpu": "0-15", "limits.memory": "24GB", "security.nesting": "true", "user.responsavel": "Incus Test", "volatile.apply_template": "copy", "volatile.base_image": "bea0f1696dc17d7a8002d8d0dd408ad51fa89212e24db1593831b0bed583a5a3", "volatile.eth0.hwaddr": "00:16:3e:dc:e3:a7"}, "devices": {"eth0": {"name": "eth0", "nictype": "bridged", "parent": "br0", "type": "nic"}, "root": {"path": "/", "pool": "local", "type": "disk"}}, "ephemeral": False, "profiles": ["default"], "restore": "", "stateful": False, "description": "", "name": "incus-test", "source": {"type": "copy", "certificate": "", "alias": "", "fingerprint": "", "properties": {}, "server": "", "secret": "", "protocol": "", "base-image": "", "mode": "", "operation": "", "secrets": {}, "source": "incus-test", "live": False, "instance_only": False, "refresh": False, "project": "amd-5700g", "allow_inconsistent": False}, "instance_type": "", "type": "container", "start": False, "reason": "new", "project": "intel-12700"}
SCRIPTLET DEBUG Candidade members: [{"roles": ["database"], "failure_domain": "default", "description": "", "config": {"user.experimentos.limits.cpu": "0-15", "user.experimentos.limits.memory": "24GB"}, "groups": ["intel-12700"], "server_name": "intel02", "url": "https://10.11.16.32:8443", "database": True, "status": "Online", "message": "Fully operational", "architecture": "x86_64"}]
SCRIPTLET DEBUG Project: {"config": {"features.images": "false", "features.profiles": "true", "features.storage.buckets": "true", "features.storage.volumes": "true", "restricted": "true", "restricted.cluster.groups": "intel-12700", "restricted.cluster.target": "allow", "restricted.containers.nesting": "allow", "restricted.devices.nic": "allow", "restricted.snapshots": "allow", "user.node.limits.cpu": "0-15", "user.node.limits.cpu.unique": "true", "user.node.limits.memory": "24GB", "user.node.represented": "true", "user.node.represented.unique": "true"}, "description": "Experimentos - máquinas intel-12700", "name": "intel-12700", "used_by": []} 
victoitor@bastion:~$ incus move incus-test --project intel-12700 --target-project auxiliar
ERROR  [2024-10-09T14:42:03-03:00] Instance placement scriptlet: SCRIPTLET DEBUG Request: {"architecture": "x86_64", "config": {"cloud-init.vendor-data": "#cloud-config\npackage_update: true\npackage_upgrade: true\npackage_reboot_if_required: true\ntimezone: America/Fortaleza\nusers:\n- gecos: Default pargo user\n  groups: sudo, video, render\n  name: pargo\n  lock_passwd: true\n  sudo: ALL=(ALL) NOPASSWD:ALL\n  shell: /bin/bash\n", "image.architecture": "amd64", "image.description": "Debian bookworm amd64 (20241009_05:24)", "image.os": "Debian", "image.release": "bookworm", "image.serial": "20241009_05:24", "image.type": "squashfs", "image.variant": "default", "limits.cpu": "6-7,14-15", "limits.cpu.allowance": "100%", "limits.memory": "1GiB", "user.responsavel": "Incus Test", "volatile.apply_template": "copy", "volatile.base_image": "bea0f1696dc17d7a8002d8d0dd408ad51fa89212e24db1593831b0bed583a5a3", "volatile.cloud-init.instance-id": "d6bc666a-e91e-48cd-a9aa-bc226762e2be", "volatile.eth0.hwaddr": "00:16:3e:df:b8:14", "volatile.idmap.base": "0", "volatile.idmap.next": "[{\"Isuid\":true,\"Isgid\":false,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":1000000000},{\"Isuid\":false,\"Isgid\":true,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":1000000000}]", "volatile.last_state.idmap": "[]", "volatile.uuid": "539d2bf4-1264-4e8d-a7d6-9ff58d098646", "volatile.uuid.generation": "539d2bf4-1264-4e8d-a7d6-9ff58d098646"}, "devices": {"eth0": {"name": "eth0", "nictype": "bridged", "parent": "br0", "type": "nic"}, "root": {"path": "/", "pool": "local", "type": "disk"}}, "ephemeral": False, "profiles": ["default"], "restore": "", "stateful": False, "description": "", "name": "incus-test", "source": {"type": "copy", "certificate": "", "alias": "", "fingerprint": "", "properties": {}, "server": "", "secret": "", "protocol": "", "base-image": "", "mode": "", "operation": "", "secrets": {}, "source": "incus-test", "live": False, "instance_only": False, "refresh": False, "project": "intel-12700", "allow_inconsistent": False}, "instance_type": "", "type": "container", "start": False, "reason": "new", "project": "auxiliar"}
SCRIPTLET DEBUG Candidade members: [{"roles": ["database-leader", "database"], "failure_domain": "default", "description": "", "config": {"user.experimentos.limits.cpu": "0-5,8-13", "user.experimentos.limits.memory": "24GB"}, "groups": ["default", "amd-5700g"], "server_name": "amd01", "url": "https://10.11.16.11:8443", "database": True, "status": "Online", "message": "Fully operational", "architecture": "x86_64"}]
SCRIPTLET DEBUG Project: {"config": {"features.images": "false", "features.profiles": "true", "features.storage.buckets": "true", "features.storage.volumes": "true", "restricted": "true", "restricted.backups": "allow", "restricted.cluster.groups": "amd-5700g", "restricted.cluster.target": "allow", "restricted.containers.lowlevel": "allow", "restricted.containers.nesting": "allow", "restricted.devices.disk": "allow", "restricted.devices.nic": "allow", "restricted.snapshots": "allow", "user.node.limits.cpu": "6-7,14-15", "user.node.represented": "true"}, "description": "Montagem e estacionamento", "name": "auxiliar", "used_by": []} 

Things actually seem consistent here, just not particularly ideal:

ERROR  [2024-11-15T02:27:20Z] [server04] Instance placement scriptlet: [stgraber][relocation] project=restrict-s03, instance=test, candidates=["server01"] 
ERROR  [2024-11-15T02:27:20Z] [server04] Instance placement scriptlet: [stgraber][new] project=restrict-s01, instance=test, candidates=["server01"] 

and then:

ERROR  [2024-11-15T02:28:04Z] [server01] Instance placement scriptlet: [stgraber][relocation] project=restrict-s01, instance=test, candidates=["server04", "server03"] 
ERROR  [2024-11-15T02:28:04Z] [server01] Instance placement scriptlet: [stgraber][new] project=restrict-s03, instance=test, candidates=["server04"] 

I don't know why in your case you're only seeing the new reason and not the relocation one.

Basically during the move, Incus uses the relocation call to determine where the instance should be going. At that point it still exists in the source project which is why we're getting the source project at that point. The set of candidates being passed during relocation are the allowed candidates for the target project and it's when the scriptlet can actually make a decision.

Then after that decision is made, Incus internally handles the cross-project move which effectively is a copy+delete, that's why we get the new call into the scriptlet again, this time with the new project as target and this time with no flexibility on the target as it has already been decided.

Now ideally we'd be able to:

  • Eliminate the following new call entirely in this scenario, finding a way to detect that this is an internal move and not a new instance being created
  • Alter the call for relocation to indicate the target project rather than source

I'll take a look into this now. The project name part should be pretty trivial, eliminating the new event will likely be a bit trickier.