aws/aws-iot-device-sdk-python-v2

Just in time provisioning (JITP) does attach policy automatically only for the first thing and then not anymore.

aLohrer opened this issue · 8 comments

Describe the bug

Iam trying to implement a JITP pipeline in our company.

I create a CA key and pem file with openssl.
Afterwards I register the CA file in aws iot web console.
Thereafter I create a JITP provisioning template and attach the above mentioned CA file.

Afterwards I create a key and certificate file for the device, using the CA.

I use the following script to do so. Heavily leaning on https://github.com/aws-samples/aws-iot-jitp-sample-scripts/blob/master/bin/provision :

DEVICE=$1
CA=${2:-root}

if [[ -z "$DEVICE" ]]; then
	echo "Usage: $0 deviceName [CA]"
	exit
fi

if [[ ! -f "$CA.pem" ]]; then
	echo "Could not find certificate for $CA"
	exit
fi

openssl genrsa -out "$DEVICE.key" 2048
openssl req -new -key "$DEVICE.key" -out "$DEVICE.csr"  -subj "/CN=$DEVICE"
openssl x509 -req -in $DEVICE.csr -CA "$CA.pem" -CAkey "$CA.key" -CAcreateserial -out "$DEVICE.crt.tmp" -days 500 -sha256
cat "$DEVICE.crt.tmp" "$CA.pem" > "$DEVICE.crt"
rm "$DEVICE.crt.tmp"
rm "$DEVICE.csr"`

Finally I have the private and public key (lets call them attempt10.key and attempt10.crt) for my device and can try fleetprovisioning.py from the sample Folder.

python3 fleetprovisioning.py --endpoint xxx-ats.iot.eu-central-1.amazonaws.com --cert certs/attempt10.crt --key certs/attempt10.key  --template_name jitp2 --template_parameters '{"vd_id":"attempt10"}'

Expected Behavior

The fleetprovisioning.py should lead to a thing creation.

I should be able to create many key and cert files and provision a thing for each of them.

Current Behavior

fleetprovisioning.py works only once.
When I create a second key / cert pair and do the same step as above the fleetprovisioning fails.

Lets say the next keypair is attempt6, I get the following:

python3 fleetprovisioning.py --endpoint a2lp30jbxxx2ss-ats.iot.eu-central-1.amazonaws.com --cert certs/attempt6.crt --key certs/attempt6.key  --template_name jitp2 --template_parameters '{"vd_id":"attempt6"}'
Connecting to a2lp30jbxxx2ss-ats.iot.eu-central-1.amazonaws.com with client ID 'test-8e9192e3-e5f6-4f65-bc6a-e0c5de7d29fc'...
Traceback (most recent call last):
  File "fleetprovisioning.py", line 271, in <module>
    connected_future.result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 444, in result
    return self.__get_result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
awscrt.exceptions.AwsCrtError: AWS_ERROR_MQTT_UNEXPECTED_HANGUP: The connection was closed unexpectedly.

This happens in the initial mqtt connection buildup. So even before any provisioning code is called by fleetprovisioning.py.
AwsCrtError points to an authentication error.

However when I go to the iot webconsole under Security->Certificates I can see that a new certificates appeared.
However unlike the first one, it sits on pending instead of active.
If I manually set it to active and attach a policy and then rerun the code from above, everything works fine.

However the beauty oft JITP should be that it automatically attahces the policy and activates the certificate.

Reproduction Steps

When I create a new Job Template and register a new CA and combine them, I can again provision the first device without problems but then for the second run through its the same behaviour as described above.

Also when keypair is successfull, either because its the first one for that template or because of manual activation, I can reuse that keypair many times over to create new / different things with it. Iam not sure if that is intended.
But as I understood it every device should have its own keypair. And it would be also nice if that keypair can be only used once for a thing provisioning, but I guess that is a differemt issue.

In case you want to reproduce it.
Here is my Job Template:

{
  "Parameters": {
    "AWS::IoT::Certificate::CommonName": {
      "Type": "String"
    },
    "AWS::IoT::Certificate::Id": {
      "Type": "String"
    }
  },
  "Resources": {
    "policy_StagingEverything": {
      "Type": "AWS::IoT::Policy",
      "Properties": {
        "PolicyName": "StagingEverything"
      }
    },
    "certificate": {
      "Type": "AWS::IoT::Certificate",
      "Properties": {
        "CertificateId": {
          "Ref": "AWS::IoT::Certificate::Id"
        },
        "Status": "Active"
      }
    },
    "thing": {
      "Type": "AWS::IoT::Thing",
      "OverrideSettings": {
        "AttributePayload": "MERGE",
        "ThingGroups": "DO_NOTHING",
        "ThingTypeName": "REPLACE"
      },
      "Properties": {
        "AttributePayload": {},
        "ThingGroups": [
          "admin"
        ],
        "ThingName": {
          "Ref": "AWS::IoT::Certificate::CommonName"
        }
      }
    }
  }
}

And here is the policy I use (its for developing and allows everything for now, will change it once JITP works)

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "iot:*",
      "Resource": "*"
    }
  ]
}

Possible Solution

Probably I misunderstood something about how to use JITP and self signed certificates.
But also the first attempt works the first time as expected.

Additional Information/Context

No response

SDK version used

awscrt==0.19.1 awsiot==0.1.3 awsiotsdk==1.19.0

Environment details (OS name and version, etc.)

Ubuntu 20.04 Python 3.8.10 OpenSSL 1.1.1f 31 Mar 2020

jmklix commented

Thanks for the detailed write up @aLohrer. I looks like you are trying to use this correctly. The error that you are seeing (AWS_ERROR_MQTT_UNEXPECTED_HANGUP) means that you aren't connecting to IoT core. This could caused by many things, but my guess but my guess is that the second key/cert pairs are not generated/attached correctly. Can you list the commands you use to generate the first and second key/cert pairs?

If you haven't already take a look at bulk registration and see if that fits your use case.

Greetings! It looks like this issue hasn’t been active in longer than a week. We encourage you to check if this is still an issue in the latest release. Because it has been longer than a week since the last update on this, and in the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or add an upvote to prevent automatic closure, or if the issue is already closed, please feel free to open a new one.

Hi @jmklix,
thanks for looking into this.

Its really JITP we are interested in. It integrates much nicer in our deployment piepline. But yeah if we cant get JITP running I can build something around Bulk registration.

Back to JITP:
For creating device certificates I always use the same script, which I showed at the strart of my issue.

So its always something like:

bash createcert.sh deviceName CaName

So there really should be no difference between the first and all next certficates created.

If it helps I can create a CA keypair + a couple of Device keypairs and sent you over. I would not use them in production, so no harm done.

What is really strange is that it works the first time for each Provisioning Template + CA.

On the second and consecutives trial the certificate will show up under security/certificates in the IOT webconsole.
But they will hang in "Pending activation"

image

If I activate them manually and attach the policy manually. I can rerun the fleetprovisioning.py and everything works as expected.

So my guess is, there is nothuing wrong with the certificates, but something goes wrong in the provisioning workflow.

I also found a stackoverflow with the same issue from June. So it seems it does not happen only to us. But unfortanetly no one answered it yet.
https://stackoverflow.com/questions/76569449/aws-iot-active-provisioning-template-does-nothing

Appreciate your help.
Cheers

I've tried multiple different configurations, and I can successfully register and connect with multiple deviceCerts. I do see the first connection fail, but I think that is just the certificate being registered. The sdk handles the reconnect and is able to publish messages without restarting the pubsub sample. If you aren't already, try using the MQTT5 pubsub sample. MQTT5 has better error messages and might let you know why your second device can't connect. Here is what I'm using to generate my devices and then run the mqtt5 pubsub:

openssl genrsa -out deviceCert.key 2048
openssl req -new -key deviceCert.key -out deviceCert.csr
openssl x509 -req -in deviceCert.csr -CA rootCA.pem -CAkey rootCA.key -CAcreateserial -out deviceCert.crt -days 365 -sha256
cat deviceCert.crt rootCA.pem > deviceCertAndCACert.crt

python3 samples/mqtt5_pubsub.py --endpoint <endpoint>-ats.iot.us-west-2.amazonaws.com --cert deviceCertAndCACert.crt --key deviceCert.key --ca_file <path-to>/AmazonRootCA1.pem

*Note I passed the AmazonRootCA1 not the rootCA.pem generated in the guide

Please let me know if you are still having any problems with jitp or have any questions with how I got it running.

Greetings! It looks like this issue hasn’t been active in longer than a week. We encourage you to check if this is still an issue in the latest release. Because it has been longer than a week since the last update on this, and in the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or add an upvote to prevent automatic closure, or if the issue is already closed, please feel free to open a new one.

Sorry for the late response.
Happy to see that it works for you.
Unfortanetly it still does not work in my case.

So I guess either my JITP Configuration or comething with my CA is wrong.
I didn find a way to get logs why the JIPT is not excecuted.
Is there a way for me to debug this, other then just trying different JITP Configurations / CAs ?

Cheers

There are two different types of logs that you can enable. The sdk logs can be enabled with

io.init_logging(io.LogLevel.Error, 'stderr')

And those should output something even when JITP is not executed. You can also enable cloudwatch logs here these logs would have more useful information related to connection attempts. But I'm not sure if JITP events would show up if the certs aren't correctly connected to your account. Let me know if you have any questions about logging or anything else about JITP.

Greetings! It looks like this issue hasn’t been active in longer than a week. We encourage you to check if this is still an issue in the latest release. Because it has been longer than a week since the last update on this, and in the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or add an upvote to prevent automatic closure, or if the issue is already closed, please feel free to open a new one.