A few days ago I came to know, through German Viscuso’s tweet, that the ASK SDK for NodeJS supports data persistence in S3. It is not supported by ASK SDK for Python yet.
I thought it is a nice feature to have and decided to check the documentation of ASK SDK for Python and did not found something similar to what German showed. I thought of giving it a try and in this short article I am going to describe how a possible implementation of this feature can be done using, or reusing I should day, already existent SDK features.
Disclaimer: This is not officially supported. Use it at your own risk. The implementation can be improve, definitely. I wrote a very short proposal as a feature request in ASK SDK for Python Data Persistence in S3.
- A description of the module S3 Persistence for the ASK SDK for Python.
- Instructions to get this module working in your skill.
- Further comments
This module consists of 3 files. I like to think of it as a pluggable module as just copying and pasting the files in the right place will get you up and running quickly.
This module uses Boto 3
library. It already comes with the ASK SDK.
The data is persisted as a JSON.
I took some ideas from the SDK for NodeJS where this is already implemented and figured out how it works with the example German coded. For example, the structure of the module was taken from both SDKs (for NodeJS and Python).
It works like the current DynamoDB Persistence one. It consists in 2 files:
- The adapter, which I called:
S3PersistenceAdapter
- The object key generator, which I called
ObjectKeyGenerators
If we check the DynamoDB one, you will notice it is (mostly) the same:
✗ ls -1
__init__.py
__version__.py
adapter.py
partition_keygen.py
I have placed both files in ask_sdk_s3_persistence
directory.
First of all the class S3PersistenceAdapter
derives from AbstractPersistenceAdapter
like the DynamoDB one.
The constructor in this case, unlike the DynamoDB one, takes different parameters:
def __init__(self, bucket_name, object_generator,
s3_client=boto3.client('s3'), path_prefix=None):
bucket_name
: is the name of the bucket were you will store the json file that contains the data you want to persist. This bucket has to be created beforehand as well as all the right permissions to your execution role and your Lambda function to access S3.
object_generator
: This is used to distinguish later on each to whom the data belongs to. The file that is created to store the data uses the information obtained from this generator. More on this later in the article.
s3_client
: In this case is straight forward, S3 Client.
path_prefix
: This is used when there is some directory structure in your bucket, for example, if you want to have a directory per customer or per skill.
Deriving from AbrstractPersistenceAdapter
means that we have to implement 2 methods that will support the basic get
and set
features:
This method takes the request_envelope
as parameter. A call to the object_generator
function passing that request_envelope
will return an id
that joined with the path_prefix
will be used to create an objectId
which is basically the location and the name of the file where the data will be stored. Then it tries to get that object from S3 using the get_object
method. If it is not possible then it will create a new one with the creating timestamp in a tag called created
. Then it returns the object in any case.
This method takes the request_envelope
and the attributes
to be stored. Like the get_attributes
method it, first, creates the objectId
. Then takes the attributes dictionary and converts it to bytes and finally creates the object in S3 using put_object
.
In both methods some exceptions and errors are handled.
This is basically a set of functions that extract the user_id
, device_id
or application_id
from the request envelope.
Is it really needed?, probably not, however it keeps the code organised and easy to maintain.
As an additional comment: the example from German includes a function to extract the application_id
, I wonder why it is not included in the ObjectKeyGenerators
from NodeJS’ SDK. Seems like a nice proposal.
The SDK comes with a class called StandardSkillBuilder
that lives in standard.py
. It derives from SkillBuilder
. After thinking a bit about this, I wanted the S3 Persistence module to be, kind of, pluggable so that is why this I decided to create standard_s3.py
. It also derives from SkillBuilder
so it requires a skill_configuration
that is created based on the parameters the constructor receives. It takes basically the same parameters as the S3 Persistence Adapter, as follows:
def __init__(self, bucket_name=None, s3_client=None,
object_generator=None, path_prefix=None):
- Grab a copy of the files from my GH repository here you can also clone the repo and try the sample skill yourself. Thanks German for the idea. The sample skill is the same proposed by German in his example.
In case you wonder which files:
-
This directory and place it in the root directory right where the rest of the SDK modules are.
-
This file and place it in
ask_sdk
right wherestandard.py
is.
- In your main skill file (or lambda) add this:
from ask_sdk.standard_s3 import StandardSkillBuilder
from ask_sdk_s3_persistence.ObjectKeyGenerators import applicationId
Note: When it comes to the ObjectKeyGenerators
use any of the available, that is up to you, remember that it will be the name of the file where the data will be stored. I have used applicationId
to be consistent with the sample skill written by German.
- Also add this:
s3_client = boto3.client('s3') # a
path_prefix = 'test_prefix' # b
ssb = StandardSkillBuilder(bucket_name='<bucket_name>', object_generator=applicationId, s3_client=s3_client, path_prefix=path_prefix) # c
a) Is the S3 Client. I know, self-explanatory. :)
b) In case you have some folder structure in your bucket, do not pass it, there is None
default value and a verification.
c) Create a new StandardSkillBuilder
passing the name of the bucket, the object key generator, the client and path prefix (if used) to it.
- As you may notice in the sample skill file index.py I have included 2 interceptors (request and response) that will take care of loading and saving the data based on those events. When
Response
one saves the data and whenRequest
another loads it. Both can be modified as per your particular needs. This is just a sample.
- A better error/exception handling is needed.
- There is an S3 Transfer module that comes with Boto 3, not sure if it would be better to use the methods that come with it somehow.
- The
standard.py
andstandard_s3.py
can be merged to have a genericStandarSkillBuilder(*args)
. - An improvement on the
get_attributes
method that avoids storing dummy data can be very nice. - Include tests.
- Posted an official proposal in the User Voice portal. Let’s see if people consider it useful.
- Be aware that even though the SDK for NodeJS supports S3 data persistence officially they suggest using DynamoDB for a better experience. Check the note here. Thanks @nikhilym for pointing this out.