NixOS/nixops-aws

Root volume is considered as 'detached' using nixpkgs 20.03

tewfik-ghariani opened this issue · 6 comments

After provisioning instances using nixpkgs 20.03, the check command displays the following output :

Machines state:
+----------+--------+-----+-----------+----------+----------------+-------+------------------------------------------------------------+
| Name     | Exists | Up  | Reachable | Disks OK | Load avg.      | Units | Notes                                                      |
+----------+--------+-----+-----------+----------+----------------+-------+------------------------------------------------------------+
| frontend | Yes    | Yes | Yes       | No       | 0.17 0.09 0.02 |       | volume ‘vol-xxxxxxxxxxxxxxxxx’ not attached to ‘/dev/xvda’ |
+----------+--------+-----+-----------+----------+----------------+-------+------------------------------------------------------------+
Non machines resources state:
+---------------+--------+
| Name          | Exists |
+---------------+--------+
| frontend_data | Yes    |
| keypair       | Yes    |
| sg            | Yes    |
| role          | Yes    |
+---------------+--------+

I checked in the AWS console and made sure that the volume is attached

Might be related to : NixOS/nixpkgs#67349

This seems related to

for device_stored, v in self.block_device_mapping.items():
device_real = device_name_stored_to_real(device_stored)
device_that_boto_expects = device_name_to_boto_expected(
device_real
) # boto expects only sd names
if device_that_boto_expects not in instance.block_device_mapping.keys() and v.get(
"volumeId", None
):
res.disks_ok = False
res.messages.append(
"volume ‘{0}’ not attached to ‘{1}’".format(
v["volumeId"], device_real
)
)

The question is, does boto still expects only sd device names?

cc @AmineChikhaoui

andir commented

This entire device name matching seems super flaky to me. Why aren't we using some kind of device path or, disk uuid, … that AWS is setting on the device so we can probably match them among many other devices?

We shouldn't have to care about the exact device name as long as we can find it via some common identifier.

andir commented

After looking into this a bit more it looks like sda and xvda are always(?) the same thing and are being used interchangeable.

I am not sure if treating sdX and xvdX the same does cause any harm but in my test case that didn't cause an issue. If that is fine then the following patch should work:

diff --git a/nixops_aws/backends/ec2.py b/nixops_aws/backends/ec2.py
index 5544334..2ea9d94 100644
--- a/nixops_aws/backends/ec2.py
+++ b/nixops_aws/backends/ec2.py
@@ -2041,7 +2041,11 @@ class EC2State(MachineState[EC2Definition], EC2CommonState):
                     device_real
                 )  # boto expects only sd names
 
-                if device_that_boto_expects not in instance.block_device_mapping.keys() and v.get(
+                mapped_devices = instance.block_device_mapping.keys()
+
+                if device_that_boto_expects not in mapped_devices and \
+                        device_real not in mapped_devices \
+                        and v.get(
                     "volumeId", None
                 ):
                     res.disks_ok = False
andir commented

@tewfik-ghariani do you mind testing with the above patch?

AWS seems to say everything is lies, but I think we can count on this behavior ~enough. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/device_naming.html

Sorry didn't get back to you @andir

Yes, this definitely solves the 'root volume detached' issue, thank you so much!

Closing,