This is a Singer tap built with TypeScript/javascript that runs in Node and produces JSON-formatted data following the Singer spec, and most of the spec is reflected in tap-types.ts.
This tap:
- Scans a local folder, treating the files it finds there as emails (MIME), parsing them into JSON with Nodemailer.Mailparser
- Outputs a schema along with the resulting json for each file
This tap is also meant as a template to be forked for other uses. It separates the scanning of a resource collection (e.g. a folder) and the parsing of the individual resources (e.g. MIME files) into separate modules for easy drop-in replacement. A scanner module is included (scan-dir.ts for scanning local folders) and a parser module (parse-mime.ts for parsing emails) is included as well.
This code path is documented here.
If you're used to JavaScript code, here are a few newer ES6/ES7/TypeScript code features we use that might be new to you:
- Arrow functions are largely interchangable
with the more familiar
function
syntax:
let aFunction = () => {...
is roughly equal to
function aFunction() {...
- Promises replace callbacks to clean up and clarify our code
- Async/await builds on promises to make asynchronous code almost as simple (in many cases) as synchronous.
In addition to running as Singer taps, parsers can also be deployed as AWS Lambda functions. This allows you to take the exact same parser that your tap uses and deploy it to parse files one-at-at-time via triggers which run as they are dropped into a bucket. This functionality is enabled out of the box; the deploy script will create the bucket, deploy the parser as a Lambda function and add a trigger to call it when files are created in the bucket.
This code path is documented here.
- Dependencies:
- Clone:
git clone https://github.com/donpedro/tap-ts-starter.git
- After cloning the repo, be sure to run
npm install
to install npm packages
- After cloning the repo, be sure to run
- Debug: with VScode use
Open Folder
to open the project folder, then hit F5 to debug. This runs without compiling to javascript using ts-node - Test:
npm test
ornpm t
- Compile documentation:
npm run build-docs-tap
andnpm run build-docs-aws
- Compile to javascript:
npm run build
- Deploy:
serverless deploy --aws-profile [profilename]
- depends on aws-cli named profiles
- More options are included from TypeScript Library Starter and are documented here
- Run using included test data (be sure to build first):
node dist/tap-main.cjs.js --config testdata/emails.tap-config.json
Note: This document is written in Markdown. We like to use Typora and Markdown Preview Plus for our Markdown work.
There is a typescript file called 'xml-parse.ts', this file exports a function 'parseItem' and this function returns a Promise. The Promise uses an npm module 'xml2js' to parse the given xml file to json. When the parsing is complete the Promise resolves if a valid is result is parsed or rejects if an error is encountered.
npm module: xml2js
Here is a guide to deploy functions to AWS Lambda, to use buckets with AWS S3, and the starter project.
I followed this guide for configuring the aws-cli
Download and Install "Windows x86-64 executable installer" for Python
- Add "C:\Python36" and "C:\Python36\Scripts" to PATH environment variable
- Add "C:\Users\[user]\AppData\Roaming\Python\Python36\Scripts" to PATH env var
- Run command "pip install awscli"
AWS stores credentials in two files in folder "{userprofile}\.aws"
-
to add a named profile to credential folder use "aws configure --profile [a profile name]" command. You will then be prompted to enter access key, secret key, region, and output type
-
To get this information
- Create your own AWS account (aws.amazon.com)
- Find security credentials on the dropdown menu when you click your profile name
- Go to users, add new user, select programmatic access
- Create group and add the new user to the newly created group
- Finish creating the user and it will give you the access and secret key, copy those keys to the cli after running "aws configure --profile [a profile name]", for region use "us-west-2" and for output press enter and it will default to json
- Now you can look in the .aws/configure and .aws/credentials files and see that the profile was added
-
You now have a named profile to run the "serverless deploy --aws-profile [profilename]" cmd, but it will still fail because you do not have 'permission' to use the AWS services. To fix:
-
Go to the IAM Management Console" for AWS
-
Go to policies, create policy, name it 'cloudformation', select json, and copy this into it:
-
{ "Version": "2012-10-17", "Statement": [ { "Sid": "Stmt1449904348000", "Effect": "Allow", "Action": [ "cloudformation:CreateStack", "cloudformation:CreateChangeSet", "cloudformation:ListStacks", "cloudformation:UpdateStack", "cloudformation:DescribeChangeSet", "cloudformation:ExecuteChangeSet" , "cloudformation:DescribeStacks", "cloudformation:DescribeStackResource", "cloudformation:ValidateTemplate" ], "Resource": [ "*" ] } ] }
-
-
Create another policy, name it FullAccess_APIGatewayRec" and copy this to it:
-
{ "Version": "2012-10-17", "Statement": [ { "Sid": "Stmt1467321765000", "Effect": "Allow", "Action": [ "apigateway:*" ], "Resource": [ "*" ] } ] }
-
-
Go to group that you created for your user, go to permissions, attach policy, and attach the following policies:
- "cloudformation" //manually created
- "FullAccess_APIGatewayRec" //manually created
- "AWSLambdaFullAccess"
- "IAMFullAccess"
- "AmazonS3FullAccess"
- "CloudWatchFullAccess"
- "CloudFrontFullAccess"
- "AWSCloudFormationReadOnlyAccess"
-
-
Deployment:
- In file "serverless.yml" change bucket name to something unique. Example: "fdsa-trigger-bucket12" in any place you see "fdsa-trigger-bucket"
- In file "serverless.yml", under functions, for each function (filetrigger, hello), change handler to "dist/es/handler.[functionname]"
- Navigate to project folder then run "npm run build" and then run "serverless deploy --aws-profile [profilename]" command
- Go to AWS Lambda service, change region to oregon, select manage functions
- There should be two functions which are also found in your "handler.ts" source file
- One of these functions listens for an event from AWS S3, this event happens when a file is dropped into a specific bucket in S3.
- Go to AWS S3 service
- You should find a bucket called "fdsa-trigger-bucket", this is where you can drop a file that will be processed by the Lambda function.
- Drop a file, Go to AWS CloudWatch, go to logs, go to the correct function, and you should see a log of the file that was dropped.
We are using Jest for our unit testing.
So far src/parse-mime.ts
is the only module that is being tested.
In the folder test
there is a file parse-mime.test.ts
which is the Jest test case for testing the parser.
It works by first reading two files:
- One is the test data, in this case .eml files
- Second is a .json file that contains an expected result if we ran
src/parse-mime.ts
on the test file
These two files are passed into a "matcher" which is a jest function used to check that values meet a certain condition. We are checking if the expected result file matches what we actually get when we run src/parse-mime.ts
.
- Add a .eml file to the
testdata/emails
folder- Run the VS Code debugger with the configuration
Debug mime parser
while your new test file is open on the screen. - Copy the output from the debug console
- Run the copied result through this JSON-validator in order to check is JSON is valid and to format in a more readable way
- Run the VS Code debugger with the configuration
- Add a .json file to the
testdata/testoutput
folder - Take the newly formatted JSON and paste it into your test output file
- In "email.test-config.json" there is an array of JSON objects. Each object has two properties:
testdata
andexpectedresult
. Add a new object with your new file names. - The tester will now run through all tests including the newly added test case
To run the tester: run the command npm test
As the tester runs it will print which files are being tested
Example: Tested data input: test.eml with expected output: Otest.json
If a test case fails, the files it failed with will be the last files printed.