In this workshop, you will learn about the different components of AWS IoT Analytics. You will configure AWS IoT Core to ingest stream data from the AWS Device Simulator, process batch data using Amazon ECS, build an analytics pipeline using AWS IoT Analytics, visualize the data using Amazon QuickSight, and perform machine learning using Jupyter Notebooks. Join us, and build a solution that helps you perform analytics on appliance energy usage in a smart building and forecast energy utilization to optimize consumption.
- Prerequisites
- Solution Architecture Overview
- Step 1a: Build the Streaming data workflow
- Step 1b: Create Stream Analytics Pipeline
- Step 1c: Analyse the data
- Step 2a: Build the Batch analytics workflow
- Step 2b: Create Batch Analytics Pipeline
- Step 2c: Analyse Stream and Batch data
- Recap and Review - So far
- Step 3: Visualize your data using Quicksight
- Step 4: Machine Learning and Forecasting with Jupyter Notebooks
- Recap and Review - What did we learn in this workshop?
- Clean up resources in AWS
- Troubleshooting
To conduct the workshop you will need the following tools/setup/knowledge:
- AWS Account
- Laptop
- Secure shell (ssh) to login into your Docker instance (EC2)
- Mac OS/Linux: command lines tools are installed by default
- Windows
- Putty: ssh client: http://www.putty.org/
- Manual connect (ssh) to an EC2 instance from Windows with Putty: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/putty.html
- You need to have an ssh key-pair -
- A ssh key-pair can be generated or imported in the AWS console under EC2 -> Key Pairs
- Download the .pem file locally to log into the Ec2 docker instance later in the workshop
- For MAC / UNIX, change permissions - chmod 400 "paste-your-keypair-filename"
- You are in one of the following regions:
- us-east-1 (N Virgina)
- us-east-2 (Ohio)
- us-west-2 (Oregon)
- eu-west-1 (Ireland)
- You do not have more than 3 VPCs already deployed in the active region. In the workshop you will create 2 VPCs, and the limit for VPCs per region is 5.
- You do not have more than 3 Elastic IP addresses already deloyed in the active region. In the workshop you will create 2 Elastic IP addresses for the Device Simulator, and the limit of Elastic IPs per region is 5.
The IoT Device Simulator allows you to simulate real world devices by creating device types and data schemas via a web interface and allowing them to connect to the AWS IoT message broker.
By choosing one of the links below you will be automatically redirected to the CloudFormation section of the AWS Console where your IoT Device Simulator stack will be launched:
- Launch CloudFormation stack in us-east-1 (N. Virginia)
- Launch CloudFormation stack in us-west-2 (Oregon)
- Launch CloudFormation stack in us-east-2 (Ohio)
- Launch CloudFormation stack in eu-west-1 (Ireland)
After you have been redirected to the AWS CloudFormation console, take the following steps to launch your stack:
- Parameters - Input Administrator Name & Email (An ID and password will be emailed to you for the IoT Device Simulator)
- Capabilities - Check "I acknowledge that AWS CloudFormation might create IAM resources." at the bottom of the page
- Create Stack
- Wait until the stack creation is complete
The CloudFormation creation may take between 10-25 mins to complete. In the Outputs section of your CloudFormation stack, you will find the Management console URL for the IoT simulator. Please copy the url to use in the next section.
[Top]
You will provision the smart home endpoint to publish telemetric data points to AWS IoT.
Please login to the IoT Device Simulator Management console (link copied from the earlier step) with the provided credentials.
Credentials for the Device Simulator will be mailed to the email address provided during CloudFormation stack creation.
- Navigate to Modules -> Device Types -> Click Add Device Type
- Device Type Name: smart-home
- Data Topic: smarthome/house1/energy/appliances
- Data Transmission Duration: 7200000
- Data Transmission Interval: 3000
- Message Payload: Click Add Attribute and add the following attributes:
Attribute Name | Data Type | Float Precision | Integer Minimum Value | Integer Maximum Value |
---|---|---|---|---|
sub_metering_1 | float | 2 | 10 | 100 |
sub_metering_2 | float | 2 | 10 | 100 |
sub_metering_3 | float | 2 | 10 | 25 |
global_active_power | float | 2 | 1 | 8 |
global_reactive_power | float | 2 | 5 | 35 |
voltage | float | 2 | 10 | 250 |
timestamp | UTC Timestamp (Choose Default) |
- Once the sample message payload shows all the attributes above, click Save
- Navigate to Modules -> Widgets -> Add Widget
- Select 'smart-home' as the Device Type
- Number of widgets: 1 -> Submit
We have now created a simulated smart home device which is collecting power usage data and publishing that data to AWS IoT Core on the 'smarthome/house1/energy/appliances' topic.
Note: You will use the AWS console for the remainder of the workshop. Sign-in to the AWS console.
We will verify that the smart home device is configured and publishing data to the correct topic.
- From the AWS console, choose the IoT Core service
- Navigate to Test (On the left pane)
- Under Subscription input the following:
- Subscription topic: 'smarthome/house1/energy/appliances'
- Click Subscribe to topic
After a few seconds, you should see your simulated devices's data that is published on the 'smarthome/house1/energy/appliances' MQTT topic.
[Top]
In this section we will create the IoT Analytics components, analyze data and define different pipeline activities.
First you will need to create 3 S3 buckets, one for your IoT Analytics channel, one for the data store that holds your transformed data, and one for the data set that is resulted from an IoT Analytics SQL query.
- Navigate to the S3 Management Console
- Choose Create Bucket
- Bucket name: Give your bucket a unique name (must be globally unique) and append it with '-channel'. For example: 'my-iot-analytics-channel'.
- Region: The region should be the same as where you launched the Device Simulator Cloud Formation template.
- Click Next and keep all options default. Click on Create bucket to finish the creation.
- Repeat steps 1-3 twice more to finish creating the required buckets. Use the appendices '-datastore' and '-dataset' to differentiate the buckets.
You will also need to give appropriate permissions to IoT Analytics to access your Data Store bucket.
- Navigate to the S3 Management Console
- Click on your data store bucket ending with '-datastore'.
- Navigate to the Permissions tab
- Click on Bucket Policy and enter the following JSON policy (be sure to edit to include your S3 bucket name):
{
"Version": "2012-10-17",
"Id": "IoTADataStorePolicy",
"Statement": [
{
"Sid": "IoTADataStorePolicyID",
"Effect": "Allow",
"Principal": {
"Service": "iotanalytics.amazonaws.com"
},
"Action": [
"s3:GetBucketLocation",
"s3:GetObject",
"s3:ListBucket",
"s3:ListBucketMultipartUploads",
"s3:ListMultipartUploadParts",
"s3:AbortMultipartUpload",
"s3:PutObject",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::<your bucket name here>",
"arn:aws:s3:::<your bucket name here>/*"
]
}
]
}
- Click Save
Next we will create the IoT Analytics channel that will consume data from the IoT Core broker and store the data into your S3 bucket.
- Navigate to the AWS IoT Analytics console.
- In the left navigation pane, navigate to Channels
- Create a new channel
- ID: streamchannel
- Choose the Storage Type: Customer Managed S3 Bucket, and choose your Channel S3 bucket created in the previous step.
- IAM Role: Create New, and give your new IAM Role a name. This will give IoT Analytics the correct IAM policies to access your S3 bucket.
- Click 'Next' and input the following. This step will create an IoT Rule that consumes data on the specified topic.
- IoT Core topic filter: 'smarthome/house1/energy/appliances'
- IAM Role: Create New, and give your new IAM Role a name. This will give IoT Analytics the correct IAM policies to access your AWS IoT Core topic.
- Click on See messages to see the messages from your smartbuilding device arriving on the topic. Ensure your device is still running in Device Simulator if you do not see any messages.
- Click Create Channel
- Navigate to the AWS IoT Analytics console.
- In the left navigation pane, navigate to Data stores
- Create a new data store
- ID: iotastore
- Choose the Storage Type: Customer Managed S3 Bucket, and choose your Data Store S3 bucket created in the previous step.
- IAM Role: Create New, and give your new IAM Role a name. This will give IoT Analytics the correct IAM policies to access your S3 bucket.
- Click 'Next' and then Create data store
- Navigate to the AWS IoT Analytics console.
- In the left navigation pane, navigate to Pipelines
- Create a new Pipeline:
- ID: streampipeline
- Pipeline source: streamchannel
- Click Next
- IoT Analytics will automatically parse the data coming from your channel and list the attributes from your simulated device. By default, all messages are selected.
- Click Next
- Under 'Pipeline activites' you can trasform the data in your pipeline, add, or remove attributes
- Click Add Activity and choose Calculate a message attribute as the type.
- Attribute Name: cost
- Formula:
(sub_metering_1 + sub_metering_2 + sub_metering_3) * 1.5
- Test your formula by clicking Update preview and the cost attribute will appear in the message payload below.
- Add a second activity by clicking Add activity and Remove attributes from a message
- Attribute Name: id and click 'Next'. The id attribute is a unique identifier coming from the Device Simulator, but adds noise to the data set.
- Click Update preview and the id attribute will disappear from the message payload.
- Click 'Next'
- Pipeline output: Click 'Edit' and choose 'iotastore'
- Click Create Pipeline
Your IoT Analytics pipeline is now set up.
[Top]
In this section, you will learn how to use IoT Analytics to extract insights from your data set using SQL over a specified time period.
- Navigate to the AWS IoT Analytics console.
- In the left navigation pane, navigate to Data sets
- Choose Create a data set
- Select Create SQL
- ID: streamdataset
- Select data store source: iotastore - this is the S3 bucket containing the transformed data created in step 1b.
- Click Next
- Keep the default SQL statement, which should read
SELECT * FROM iotastore
and click Next - Input the folllowing:
- Data selection window: Delta time
- Offset: -5 Seconds - the 5 second offset is to ensure all 'in-flight' data is included into the data set at time of execution.
- Timestamp expression:
from_iso8601_timestamp(timestamp)
- we need to convert the ISO8601 timestamp coming from the streaming data to a standard timestamp.
- Keep all other options as default and click Next until you reach 'Configure the delivery rules of your analytics results'
- Click Add rule
- Choose Deliver result to S3
- S3 bucket: select the S3 bucket that ends with '-dataset'
- Click Create data set to finalise the creation of the data set.
- Navigate to Data sets on the lefthand navigation pane of the AWS IoT Analytics console.
- Click on 'streamdataset'
- Click on Actions and in the dropdown menu choose Run now
- On the left navigation menu, choose Content and monitor the status of your data set creation.
- The results will be shown in the preview pane and saved as a .csv in the '-dataset' S3 bucket.
[Top]
In this section we will create an EC2 instance and docker image to batch a public dataset from S3 to an IoT Analytics data store using containers.
By choosing one of the links below you will be automatically redirected to the CloudFormation section of the AWS Console where your stack will be launched.
Before launching the CloudFormation, you will need an SSH key pair to log into the EC2 instance. If you don't have an SSH key pair you can create one by:
- Navigate to the EC2 console
- Click on Key Pairs
- Click on Create Key Pair and input a name.
- Save the .pem file in a directory accessible on your computer.
- If you are running Mac or Linux, set the appropriate permissions on the keypair:
chmod 400 myec2keypair.pem
To launch the CloudFormation stack, choose one of the following links for your region, and follow the steps below:
- Launch CloudFormation stack in us-east-1 (N. Virginia)
- Launch CloudFormation stack in us-west-2 (Oregon)
- Launch CloudFormation stack in us-east-2 (Ohio)
- Launch CloudFormation stack in eu-west-2 (Ireland)
- Navigate to Parameters
- SSHKeyName - select the SSH key pair you will use to login to the EC2 instance.
- Check the box "I acknowledge that AWS CloudFormation might create IAM resources."
- Click Create stack
The CloudFormation stack will take approximately 5-7 minutes to complete launching all the necessary resources.
Once the CloudFormation has completed, navigate to the Outputs tab, and see the SSHLogin parameter. Copy this string to use when SSHing to the EC2 instance.
- SSH to the EC2 instance using the SSHLogin string copied from the above step.
- Example:
ssh -i Iotaworkshopkeypair.pem ec2-user@ec2-my-ec2-instance.eu-west-1.compute.amazonaws.com
- Example:
- Move to the docker-setup folder
cd /home/ec2-user/docker-setup
- Update your EC2 instance:
sudo yum update
- Build the docker image:
docker build -t container-app-ia .
- Veryify the image is running:
docker image ls | grep container-app-ia
- You should see an output similar to:
container-app-ia latest ad81fed784f1 2 minutes ago 534MB
- Create a new repository in Amazon Elastic Container Registry (ECR) using the AWS CLI (pre-built on your EC2 instance):
aws ecr create-repository --repository-name container-app-ia
- The output should include a JSON object which includes the item 'repositoryURI'. Copy this value into a text editor for later use.
- Login to your Docker environment (you need the ` for the command to work):
`aws ecr get-login --no-include-email`
- Tag the Docker image with the ECR Repository URI:
docker tag container-app-ia:latest <your repostoryUri here>:latest
- Push the image to ECR
docker push <your repositoryUri here>
[Top]
In this section we will create the IoT Analytics pipeline for your public data-set, analyze data and define different pipeline activities.
Next we will create the IoT Analytics channel that will consume data from the IoT Core broker and store the data into your S3 bucket.
- Navigate to the AWS IoT Analytics console.
- In the left navigation pane, navigate to Channels
- Create a new channel
- ID: batchchannel
- Choose the Storage Type: Service-managed store - in this step we will use an IoT Analytics managed S3 bucket, but you may specify a customer-managed bucket as in Step 1b if you wish.
- IoT Core topic filter: Leave this blank, as the data source for this channel will not be from AWS IoT Core.
- Leave all other options as default and click Next.
- Click Create Channel
- Navigate to the AWS IoT Analytics console.
- In the left navigation pane, navigate to Pipelines
- Create a new Pipeline:
- ID: batchpipeline
- Pipeline source: batchchannel
- Click Next
- See attributes of messages: You will not see any data on this screen, as the data source has not been fully configured yet.
- Click Next
- Under 'Pipeline activites' you can trasform the data in your pipeline, add, or remove attributes
- Click Add Activity and choose Calculate a message attribute as the type.
- Attribute Name: cost
- Formula:
(sub_metering_1 + sub_metering_2 + sub_metering_3) * 1.5
- Click 'Next'
- Pipeline output: Click 'Edit' and choose 'iotastore'
- Click Create Pipeline
Now we have created the IoT Analytics Pipeline, we can load the batch data.
A container data set allows you to automatically run your analysis tools and generate results. It brings together a SQL data set as input, a Docker container with your analysis tools and needed library files, input and output variables, and an optional schedule trigger. The input and output variables tell the executable image where to get the data and store the results.
- Navigate to the IoT Analytics Console
- Click on Data sets on the left-hand navigation pane
- Create a new data set.
- Choose Create Container next to Container data sets.
- ID: container_dataset
- Click Next
- Click Create next to Create an analysis without a data set
- Frequency: Not scheduled
- Click Next
- Select analysis container and map variables
- Choose Select from your Elastic Container Registry repository
- Select your custom analysis container from Elastic Container Registry: container-app-ia
- Select your image: latest
- Under Configure the input variables of your container add the following variables which will be passed to your Docker container and the Python script running in the instance:
Name | Type | Value |
---|---|---|
inputDataS3BucketName | String | iotareinvent18 |
inputDataS3Key | String | inputdata.csv |
iotchannel | String | batchchannel |
- Click Next
- Under IAM Role, click Edit and add the role iotAContainerRole. This role was created as part of the CloudFormation stack.
- Configure the resources for container execution:
- Compute Resource: 4 vCPUs and 16 GiB Memory
- Volume Size (GB): 2
- Click Next
- Under Configure the results of your analytics, keep all options as default.
- Click Next
- Leave Configure delivery rules for analysis results as default and click Next.
- On this menu you can configure the data set to be delivered to an S3 bucket if you wish.
- Finalize the creation of the data-set by clicking Create data set
- Navigate to the AWS IoT Analytics console
- In the left navigation pane, navigate to Data sets
- Click on container_dataset
- Choose Actions and Run Now
- The container dataset can take up to 10 minutes to run. If there are no errors, you will see a SUCCEEDED message. If it fails with an error, please see the Troubleshooting section below to enable logging.
[Top]
Now that we have two data sets into the same data store, we can analyse the result of combining both the container data and simulated device data.
- Navigate to the AWS IoT Analytics console
- In the left navigation pane, navigate to Data sets
- ID: batchdataset
- Select data store source: iotastore
- Click Next
- SQL Query:
SELECT * FROM iotastore limit 5000
- Click Next
- Leave the rest of the options as default and click Next
- Optionally, you can choose to have your dataset be placed into an S3 bucket on the Configure delivery rules for analysis results pane.
- Click Create data set
- Click on your newly created batchdataset
- Click on Actions and then Run now
- The data set will take a few minutes to execute. You should see the results of the executed data set in the result preview, as well as an outputted .csv file which includes the complete data set.
[Top]
In the workshop so far you have accomplished the following:
- Launched an IoT Device Simulator using CloudFormation
- Used the IoT Device Simulator to simulate data coming from a home energy monitoring device.
- Created an IoT Analytics pipeline that consumes, cleans and transforms that data.
- Created a custom data set with SQL that can be used for further analytics.
- Used a second CloudFormation template to launch an EC2 instance with a script to simulate a public data set.
- Used Amazon Elastic Container Registry to host your docker image and IoT Analytics to launch that docker container to simulate a custom data anlytics workload.
- Combined that simulated public data analytics workload with the simulated device data to make a meaningful data set.
The purpose of the workshop has been to show you how you can use IoT Analytics for your various data analytics workloads, whether that be from external data sets, a custom analytics workload using external tools, or consuming data directly from IoT devices in real time.
[Top]
In this section we will visualize the time series data captured from your smart home.
If you have used AWS Quicksight in the past, you can skip these steps.
- Navigate to the AWS Quicksight console
- If you have never used AWS Quicksight before, you will be asked to sign up. Be sure to choose the 'Standard' tier, and choose the correct region for your locale.
- During the sign up phase, give Quicksight access to your Amazon S3 buckets and AWS IoT Analytics via the sign up page.
- Navige to the AWS Quicksight console.
- Click on New analysis
- Click on New data set
- Choose AWS IoT Analytics under FROM NEW DATA SOURCES.
- Configure your AWS IoT Analytics data source.
- Data source name: smarthome-dashboard
- Select an AWS IoT Analytics data set to import: batchdataset
- Click on Create data source to finalise the Quicksight dashboard data source configuration. You should see the data has also been imported into SPICE, which is the analytics engine driving AWS Quicksight.
- Click on Visualize and wait a few moments until you see 'Import complete' in the upper right hand of the console. You should see all 5000 rows have been imported into SPICE.
- Under Fields list choose 'timestamp' to set the X axis for the graph.
- Click on 'sub_metering_1', 'sub_metering_2' and 'sub_metering_3' to add them to the Value column.
- For each sub_metering_value, choose the drop-down menu and set Aggregate to Average.
The graphs will look similar to below.
You can setwith different fields or visual types for visualizing other smart home related information.
[Top]
In this section we will configure an Amazon Sagemaker instance to forecast energy utilisation in the home.
- Navigate to the AWS IoT Analytics console.
- Select Notebooks from the left-hand navigation pane.
- Click Create to begin configuring a new Notebook.
- Click Blank Notebook and input the following:
- Name: smarthome_notebook
- Select data set sources: batchdataset
- Select a Notebook instance: IoTAWorkshopSagemaker - this is the name of the notebook created with CloudFormation
- Create Notebook
- Click on IoTAWorkshopSagemaker
- Choose IoT Analytics in the drop down menu
- Next to smarthome_notebook.ipynb, choose Open in Jupyter
- A new Amazon Sagemaker window should pop up and a message that says "Kernel not found". Click Continue Without Kernel
- Download the following Jupyter notebook: https://s3.amazonaws.com/iotareinvent18/SmartHomeNotebook.ipynb
- In the Jupyter Notebook console, click File -> Open
- Choose Upload and select the SmartHomeNotebook.ipynb notebook downloaded in step 8.
- Click on the SmartHomeNotebook.ipynb to be taken back to the Jupyter Notebook.
- Select conda_mxnet_p36 as the kernel and click Set kernel.
- You should see some pre-filled Python code steps in the Jupyter notebook.
- Click Run for each step. Follow the documentation presented in the Notebook. The Notebook includes information about how the machine learning process works for each step.
- Run through all the steps in the SmartHomeNotebook to understand how the machine learning training process works with Jupyter Notebooks. Note: Wait until the asterisk * disappears after running each cell. If you click 'Run' before the previous step is completed, this could cause some code to fail to complete and the algorithm will fail.
[Top]
In Steps 3 and 4, we used AWS Quicksight, and Jupyter Notebooks to visualize and use machhine learning to gather additional insights into our data.
During the workshop, we saw how you can combine raw IoT data coming in real time from devices and public data sets analysed by third party tools. Using Pipelines, you can clean and standardise this data, so you can then perform advanced visualisations and analytics on the data. IoT Analytics is designed to solve the challenge of cleaning, organising, and making usable data out of hundreds, thousands, or even millions of data points.
In order to prevent incurring additonal charges, please clean up the resources created in the workshop.
- SSH to the EC2 docker instance from step 2c and execute clean-up.sh. This script will delete the IoT Analytics channels, pipelines, and datasets.
- Example:
ssh -i Iotaworkshopkeypair.pem ec2-user@ec2-my-ec2-instance.eu-west-1.compute.amazonaws.com
cd /home/ec2-user/clean-up
sh clean-up.sh
- Example:
- Navigate to the AWS CloudFormation console to delete the CloudFormation stacks and their associated resources.
- Click on IoTAnalyticsStack and click Delete
- Click on IoTDeviceSimulator and click Delete
- Note: Deleting the CloudFormation stacks can take several minutes.
- Navigate to the AWS Quicksight console
- Click on Manage Data
- Click on batchdataset and then Delete data set then Delete
- Navigate to the Amazon ECS console
- Click on Repositories under Amazon ECR
- Select container-app-ia and click Delete
- Navigate to the AWS EC2 console
- Click on Key Pairs in the left navigation pane.
- Choose the EC2 keypair you created to SSH to your docker instance and click Delete.
- Navigate to the S3 Management Console and delete the following:
- The 3 buckets you created in Step 1b ending with '-channel', '-dataset', and '-datastore'
- Each bucket with the prefix 'iotdevicesimulator'.
- Each bucket with the prefix 'sagemaker-'
- Navigate to the AWS IoT Analytics console
- Check that all of your pipelines, data sets, and channels have been deleted. If not, you can manually delete them from the console.
[Top]
To aide in troubleshooting, you can enable logs for IoT Core and IoT Analytics that can be viewed in AWS CloudWatch.
- Navigate to the AWS IoT Core console
- Click on Settings in the left-hand pane
- Under Logs, click on Edit
- Level of verbosity: Debug (most verbose)
- Set role: Click Create New
- Name: iotcoreloggingrole
The log files from AWS IoT are send to Amazon CloudWatch. The AWS console can be used to look at these logs.
For additional troubleshooting, refer to here IoT Core Troubleshooting
[Top]
- Navigate to the AWS IoT Analytics console
- Click on Settings in the left-hand pane
- Under Logs, click on Edit
- Level of verbosity: Debug (most verbose)
- Set role: Click Create New
- Name: iotanalyticsloggingrole
- Create role
- Click on Update
The log files from AWS IoT Analytics will be send to Amazon CloudWatch. The AWS console can be used to look at these logs.
For additional troubleshooting, refer to here IoT Analytics Troubleshooting
[Top]