/glue-elastic-views-on-s3

AWS Glue Elastic Views: replicate data across multiple AWS data stores to use with your applications without having to write custom code

Primary LanguagePython

AWS Glue Elastic Views - DynamoDB to S3 Materialization

AWS Glue Elastic Views enables you to replicate data across multiple AWS data stores to use with your applications without having to write custom code. With Elastic Views, you use Structured Query Language (SQL) compatible PartiQL queries to quickly create a view of the data you want to replicate from a different data store. Based on the view you create, Elastic Views replicates data from the source data store, and lets you materialized the view in a target data store.

As on today, Only DynamoDB is supported as the source and we can chose between three different destinations: S3 or ElasticSearch or Redshift.

Miztiik Automation: AWS Glue Elastic Views - DynamoDB to S3 Materialization Image Credit: AWS Docs

This means, you can query your DynamoDB table with SQL like querying in(Athena/Redshift) or use the power of full text search offered by Elasticsearch. You can also extend this data by enriching it with data from other sources. Say for example, orders data in DynamoDB can be combined along with customer profile from another datastore.

As of now(Q1 2021) this service is in preview. That means lots of new functionality will get added in the future and any limitations mentioned in those post may be removed in future.

๐ŸŽฏ Solutions

In this demo, lets us see how we can materialize data from DynamoDB to S3 and query them using Athena.

Miztiik Automation: AWS Glue Elastic Views - DynamoDB to S3 Materialization

  1. โš™๏ธ Setting up the environment

    Let us a single region for all our resources - For example us-east-1

    • DynamoDB Table: We want to start with a very simple table that will have movies data. You can use the instruction provided in AWS Docs1. Let us call this table ev-demo-movies-table

      • Primary Key/Partition Key - year and type as Number
      • Sort Key - title and type as String

      I have included some sample data, in this repo under the directory sample_data/movie_data_01.json. Ingest this data to the table. We will use this data to materialize in S3 later

    • S3 Bucket: Setup a bucket in the same region as the table. Let us call this bucket ev-demo-bkt-010

    • IAM Role for Elastic Views: We need a IAM role for Glue Elastic Views to write to our S3 bucket and a trust policy to allow the service to assume our role. Let us call this role ev-demo-role

      Here is the policy that you will need, do not forget to change the bucket name YOUR_BUCKET_NAME_GOES_HERE the target prefix name YOUR_TARGET_PREFIX_NAME_GOES_HERE

      {
          "Version": "2012-10-17",
          "Statement": [
              {
                  "Effect": "Allow",
                  "Action": "s3:ListBucket",
                  "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME_GOES_HERE"
              },
              {
                  "Effect": "Allow",
                  "Action": [
                      "s3:PutObject",
                      "s3:GetObject",
                      "s3:DeleteObject",
                      "s3:PutObjectAcl"
                  ],
                  "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME_GOES_HERE/YOUR_TARGET_PREFIX_NAME_GOES_HERE/*"
              }
          ]
      }

      Likewise here is the trust policy

      {
        "Version": "2012-10-17",
        "Statement": [
          {
            "Sid": "",
            "Effect": "Allow",
            "Principal": {
              "Service": "materializedviews.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
          }
        ]
      }
  2. ๐Ÿš€ Configure Glue Elastic Views

    As of Q1 2021, you need to request preview access to Glue ElasticViews

    1. Source Configurations: DynamoDB

      • AWS Console > Elastic Views > https://console.aws.amazon.com/elasticviews
      • Choose Tables
        • In the tables list, choose the link to the ev-demo-movies-table.ev-demo-movies-table table.
        • On the table summary page, choose ACTIVATE. This operation can take a few minutes to complete.
          • My assumption is that a Kinesis Data Streams for DynamoDb is being setup in the background
    2. Create View

      • AWS Console > Elastic Views > https://console.aws.amazon.com/elasticviews
      • Choose Tables
        • In the tables list, choose the link to the ev-demo-movies-table.ev-demo-movies-table table.
      • View Name - For example ev-demo-view-s3
      • Provenance - year, title
      • In Query editor, specify, SELECT CAST(year AS int4), title FROM ev-demo-movies-table.ev-demo-movies-table;
      • Choose Create view
    3. Start View

    4. Materialize View

      • AWS Console > Elastic Views > https://console.aws.amazon.com/elasticviews
      • Choose Views
        • In the views list, choose the link to the ev-demo-view-s3 view
      • On the view summary page, choose Materialize View
      • Choose Materialize a view using AWS Glue EV
      • Choose Add target
      • On the Add target page, choose S3 as the target service > Target location,
        • Location for materialized view > s3://ev-demo-bkt-010
        • For encryption use default encryption
        • For Permission, choose the role we created earlier > ev-demo-role
      • Target Name > enter moviespath
        • Elastic Views creates this folder in this location, and places the Parquet files and the manifest there when it materializes the view.
      • Choose Add target > Verify Data type compatibility is without any errors.
      • After all data types are compatible, choose Materialize view to finalize the materialization.

      After a few minutes check the S3 bucket prefix path moviespath, you will find parquet files - part-00000-1303f-25cd-4670-a16b-49e29-c000.snappy.parquet and the manifest file

    5. Query Materialize View

      We will use Athena to query our data in S3. Let us begin by creating a database and table to query the movies.

      • Access Athena Console > Query Editor >

        CREATE DATABASE ev-demo-movies-db;
      • Create an external table in Athena, and configure it to point to your Amazon S3

          CREATE EXTERNAL TABLE ev-demo-materialized-table(
          title STRING,
          year INT
          )
          ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
          STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
          OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
          LOCATION 's3://ev-demo-bkt-010/moviespath/_symlink_format_manifest/'
      • Query the table

        select * from ev-demo-materialized-table limit 10;

      At this point, we have materialized view working.

  3. ๐Ÿ”ฌ Testing the solution

    Let us try to insert a new item into the source DynamoDB table. In my example, I added a item with title as ev_demo and year as 2021

    Miztiik Automation: AWS Glue Elastic Views - DynamoDB to S3 Materialization

    Wait for about a minutes and rerun the query on the materialized view, and check for the corresponding new item in the Amazon S3 target data store. Changes you make to relevant data at the source appear in the target in near-real time. If you don't see the changes immediately, try running the query again.

    Miztiik Automation: AWS Glue Elastic Views - DynamoDB to S3 Materialization

    You can observe that the new items can be queried from the materialized view in S3.

  4. ๐Ÿ“’ Conclusion

    Here we have demonstrated how to use Glue Elastic view to materialize views from DynamoDB to S3. You can extend this further by enriching the item with data from other sources using Glue workflows.

  5. ๐Ÿงน CleanUp

    If you want to destroy all the resources created by the stack, Execute the below command to delete the stack, or you can delete the stack from console as well

    • Resources created during Deploying The Application
    • Delete CloudWatch Lambda LogGroups
    • Any other custom resources, you have created for this demo
    # Delete from cdk
    cdk destroy
    
    # Follow any on-screen prompts
    
    # Delete the CF Stack, If you used cloudformation to deploy the stack.
    aws cloudformation delete-stack \
      --stack-name "MiztiikAutomationStack" \
      --region "${AWS_REGION}"

    This is not an exhaustive list, please carry out other necessary steps as maybe applicable to your needs.

๐Ÿ“Œ Who is using this

This repository aims to show how to use Glue Elastic view to materialize views to new developers, Solution Architects & Ops Engineers in AWS. Based on that knowledge these Udemy course #1, course #2 helps you build complete architecture in AWS.

๐Ÿ’ก Help/Suggestions or ๐Ÿ› Bugs

Thank you for your interest in contributing to our project. Whether it is a bug report, new feature, correction, or additional documentation or solutions, we greatly value feedback and contributions from our community. Start here

๐Ÿ‘‹ Buy me a coffee

ko-fi Buy me a coffee โ˜•.

๐Ÿ“š References

  1. Docs: How to create DynamoDB

๐Ÿท๏ธ Metadata

miztiik-success-green

Level: 200