Traditional relational databases are still very prevalent today. Often however, we need to search or slice and dice the data many different ways and RDBMS's are not great at that. As many do, will be using ElasticSearch to provide this functionality. Today we will explore how AWS Lambda can help us index our relational data into ElasticSearch with minimal effort and cost. We will be using Amazon Aurora (MySQL), Amazon Lambda, and ElasticSearch.
We will setup database triggers to fire a lambda function when an INSERT, UPDATE or DELETE occurs in Aurora. The lambda function will then be responsible for updating the ElasticSearch index.
The following tools and accounts are required to complete these instructions.
The following steps will walk you through the set-up of a CloudWatch Log and the provided lambda function that will be used to parse the log stream.
- Sign into twitter and navigate to https://apps.twitter.com/
- Click on Create New App
- Fill in the name, description, and a placeholder url and create the twitter application
- Navigate to
Keys and Access Tokens
tab and clickCreate my access token
- Save the
- Consumer Key
- Consumer Secret
- Access Token
- Access Token Secret
- Navigate to
/LogGenerator/credentials.json
and complete the fieds
The project uses by default the lambdasharp
profile. Follow these steps to setup a new profile if need be.
- Create a
lambdasharp
profile:aws configure --profile lambdasharp
- Configure the profile with the AWS credentials and region you want to use
Set up the CloudWatch Log Group /lambda-sharp/log-parser/dev
and a log stream test-log-stream
via the AWS Console or by executing AWS CLI commands.
aws logs create-log-group --log-group-name '/lambda-sharp/log-parser/dev'
aws logs create-log-stream --log-group-name '/lambda-sharp/log-parser/dev' --log-stream-name test-log-stream
The LambdaSharp-LogParserRole
lambda function requires an IAM role. You can create the LambdaSharp-LogParserRole
role via the AWS Console or by executing AWS CLI commands.
aws iam create-role --profile lambdasharp --role-name LambdaSharp-LogParserRole --assume-role-policy-document file://assets/lambda-role-policy.json
aws iam attach-role-policy --profile lambdasharp --role-name LambdaSharp-LogParserRole --policy-arn arn:aws:iam::aws:policy/AWSLambdaFullAccess
- Navigate into the LogGenerator folder:
cd LogParser
- Run:
dotnet restore
- Edit
aws-lambda-tools-defaults.json
and make sure everything is set up correctly. - Run:
dotnet lambda deploy-function
Using the included LogGenerator project, stream tweets directly into CloudWatch
The included LogGenerator project will stream live tweets directly into the CloudWatch log matching the above setup.
TODO: How will twitter credentials be provided?
- Navigate into the LogGenerator folder:
cd LogGenerator
- Run:
dotnet restore
- Run:
dotnet run
Data will begin streaming from Twitter into the CloudWatch log and will end after 30 seconds. The duration can be edited by modifying the STREAM_DURATION
variable at the top of the Program.cs file.
Note: CloudWatch log events must be sent to a log stream in chronological order. You will run into issues if multiple instances of the LogGenerator are streaming to the same log stream at the same time.
- Navigate into the LogParser folder:
cd LogParser
- Run:
dotnet restore
- Run:
dotnet lambda deploy-function
- From the AWS Console, navigate to the Lambda Services console
- Find the deployed function and click into it to find the
Triggers
tab - Add a trigger and select
CloudWatch Logs
as the trigger - Select the log group
/lambda-sharp/log-parser/dev
and add a filter name
Use the lambda function to transform the streamed data into an ElasticSearch-readable JSON format, and search it from S3.
Set up an ElasticSearch index called tweets
, and define its schema. The following is an minimal example, add additional fields for information found in the log streams.
TODO: Modify the below example for tweets!
CREATE DATABASE lambdasharp_logs;
CREATE EXTERNAL TABLE IF NOT EXISTS lambdasharp_logs.users (
`user_name` string,
`name` string,
`favorite` int,
`tweet_count` int,
`friends` int,
`follow` int,
`date_created` timestamp
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1'
) LOCATION 's3://<USERNAME>-lambda-sharp-s3-logs/users/'
TBLPROPERTIES ('has_encrypted_data'='false');
CREATE EXTERNAL TABLE IF NOT EXISTS lambdasharp_logs.tweet_info (
`user_name` string,
`retweeted` int,
`favorited` int,
`message` string,
`hashtags` array<string>,
`latitude` double,
`longitude` double,
`date_created` timestamp
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1'
) LOCATION 's3://<USERNAME>-lambda-sharp-s3-logs/tweet-info/'
TBLPROPERTIES ('has_encrypted_data'='false');
Extend the LogParser lambda function to transform CloudWatch log data into a ElasticSearch-readable JSON format.
TBD
- Erik Birkfeld for organizing.
- MindTouch for hosting.
- Copyright (c) 2017 Juan Manuel Torres, Katherine Marino, Daniel Lee
- MIT License