/dynamodb-adapter

Primary LanguageGoApache License 2.0Apache-2.0

DynamoDB Adapter

Join the chat at https://gitter.im/cloudspannerecosystem/dynamodb-adapter

cloudspannerecosystem

Introduction

DynamoDB Adapter is a tool that translates AWS DynamoDB queries to Cloud Spanner equivalent queries and runs those queries on Cloud Spanner. The adapter serves as a proxy whereby applications that use DynamoDB can send their queries to the adapter where they are then translated and executed against Cloud Spanner. DynamoDB Adapter is helpful when moving to Cloud Spanner from a DynamoDB environment without changing the code for DynamoDB queries. The APIs created by this project can be directly consumed where DynamoDB queries are used in your application.

The adapter supports the basic data types and operations required for most applications. Additionally, it also supports primary and secondary indexes in a similar way as DynamoDB. For detailed comparison of supported operations and data types, refer to the Compatibility Matrix

Examples and Quickstart

The adapter project includes an example application and sample eCommerce data model. The instructions for the sample application include migration using Harbourbridge and setup for the adapter.

Compatibility Matrix

Supported Actions

DynamoDB Adapter currently supports the folowing operations:

DynamoDB Action
BatchGetItem
BatchWriteItem
DeleteItem
GetItem
PutItem
Query
Scan
UpdateItem

Supported Data Types

DynamoDB Adapter currently supports the following DynamoDB data types

DynamoDB Data Type Spanner Data Types
N (number type) INT64, FLOAT64, NUMERIC
BOOL (boolean) BOOL
B (binary type) BYTES(MAX)
S (string and data values) STRING(MAX)

Configuration

DynamoDB Adapter requires two tables to store metadata and configuration for the project: dynamodb_adapter_table_ddl and dynamodb_adapter_config_manager. There are also three configuration files required by the adapter: config.json, spanner.json, tables.json.

By default there are two folders production and staging in config-files. This is configurable by using the enviroment variable ACTIVE_ENV and can be set to other environment names, so long as there is a matching directory in the config-files directory. If ACTIVE_ENV is not set the default environtment is staging.

dynamodb_adapter_table_ddl

dynamodb_adapter_table_ddl stores the metadata for all DynamoDB tables now stored in Cloud Spanner. It is used when the adapter starts up to create a map for all the columns names present in Spanner tables with the columns of tables present in DynamoDB. This mapping is required because DynamoDB supports the special characters in column names while Cloud Spanner only supports underscores(_). For more: Spanner Naming Conventions

CREATE TABLE
dynamodb_adapter_table_ddl
(
 column         STRING(MAX),
 tableName      STRING(MAX),
 dataType       STRING(MAX),
 originalColumn STRING(MAX),
) PRIMARY KEY (tableName, column)

dynamodb_adapter_table_ddl sample data

dynamodb_adapter_config_manager

dynamodb_adapter_config_manager contains the Pub/Sub configuration used for DynamoDB Stream compatability. It is used to do some additional operation required on the change of data in tables. It can trigger New and Old data on given Pub/Sub topic.

CREATE TABLE
dynamodb_adapter_config_manager
(
 tableName      STRING(MAX),
 config         STRING(MAX),
 cronTime       STRING(MAX),
 enabledStream  STRING(MAX),
 pubsubTopic    STRING(MAX),
 uniqueValue    STRING(MAX),
) PRIMARY KEY (tableName)

config-files/{env}/config.json

config.json contains the basic settings for DynamoDB Adapter; GCP Project, Cloud Spanner Database and query record limit.

Key Description
GoogleProjectID Your Google Project ID
SpannerDb Your Spanner Database Name
QueryLimit Default limit for the number of records returned in query

For example:

{
    "GoogleProjectID"   : "first-project",
    "SpannerDb"         : "test-db",
    "QueryLimit"        : 5000
}

config-files/{env}/spanner.json

spanner.json is a key/value mapping file for table names with a Cloud Spanner instance ids. This enables the adapter to query data for a particular table on different Cloud Spanner instances.

For example:

{
    "dynamodb_adapter_table_ddl": "spanner-2 ",
    "dynamodb_adapter_config_manager": "spanner-2",
    "tableName1": "spanner-1",
    "tableName2": "spanner-1"
    ...
    ...
}

config-files/{env}/tables.json

tables.json contains the description of the tables as they appear in DynamoDB. This includes all table's primary key, columns and index information. This file supports the update and query operations by providing the primary key, sort key and any other indexes present.

Key Description
tableName Name of the table in DynamoDB
partitionKey Primary key of the table in DynamoDB
sortKey Sorting key of the table in DynamoDB
attributeTypes Key/Value list of column names and type
indices Collection of index objects that represent the indexes present in the DynamoDB table

For example:

{
    "tableName": {
        "partitionKey": "primary key or Partition key",
        "sortKey": "sorting key of dynamoDB adapter",
        "attributeTypes": {
            "column_a": "N",
            "column_b": "S",
            "column_of_bytes": "B",
            "my_boolean_column": "BOOL"
        },
        "indices": {
            "indexName1": {
                "sortKey": "sort key for indexName1",
                "partitionKey": "partition key for indexName1"
            },
            "another_index": {
                "sortKey": "sort key for another_index",
                "partitionKey": "partition key for another_index"
            }
        }
    },
    .....
    .....
}

Starting DynamoDB Adapter

Complete the steps described in Set up, which covers creating and setting a default Google Cloud project, enabling billing, enabling the Cloud Spanner API, and setting up OAuth 2.0 to get authentication credentials to use the Cloud Spanner API.

In particular, ensure that you run

gcloud auth application-default login

to set up your local development environment with authentication credentials.

Set the GCLOUD_PROJECT environment variable to your Google Cloud project ID:

gcloud config set project [MY_PROJECT NAME]
export ACTIVE_ENV=PRODUCTION
go run main.go

Internal Startup Stages

When DynamoDB Adapter starts up the following steps are performed:

  • Stage 1 - Configuration is loaded according the Environment Variable ACTIVE_ENV
  • Stage 2 - Connections to Cloud Spanner instances are initialized. Connections to all the instances are started it doesn't need to start the connection again and again for every request.
  • Stage 3 - dynamodb_adapter_table_ddl is parsed and will stored in ram for faster access of data.
  • Stage 4 - dynamodb_adapter_config_manager is loaded into ram. The adapter will check every 1 min if configuration has been changed, if data is changed it will be updated in memory.
  • Stage 5 - Start the API listener to accept DynamoDB operations.

Advanced

Embedding the Configuration

The rice-box package can be used to increase preformance by converting the configuration files into Golang source code and there by compiling them into the binary. If they are not found in the binary rice-box will look to the disk for the configuration files.

Install rice package

This package is required to load the config files. This is required in the first step of the running DynamoDB Adapter.

Follow the link.

run command for creating the file

This is required to increase the performance when any config file is changed so that configuration files can be loaded directly from go file.

rice embed-go

API Documentation

This is can be imported in Postman or can be used for Swagger UI. You can get open-api-spec file here here