/emr-mailer

emr hive report mailer

Primary LanguageRubyApache License 2.0Apache-2.0

Send a hive report via email.

This script will download files from an s3 url, concatenate them together, zip up the results and send it as an attachment to a specified email address.

This script assumes that you are using gmail (can be a custom google accounts domain) to send mail. You should modify FROM_EMAIL and FROM_PASSWORD above with the proper credentials.

This script does not currently handle:

  1. multiple email addresses (you probably want a mailing list anyway)
  2. multiple reports per email (just run the script multiple times)
  3. nested directories (don't partition your report)
  4. compressed files (don't think we're using this anyway for reports)

The intended usage is to run this as a job step with your hive script, passing it in the location of the report results in s3.

E.g. (you will need to change paths and arguments to match your setup)

elastic-mapreduce --create --name "my awesome report ${MONTH}" \

--num-instances 10 --instance-type c1.medium --hadoop-version 0.20 \

--hive-script --arg s3://path/to/hive/script.sql \

--args -d,MONTH=${MONTH} --args -d,START=${START} --args -d,END=${END} \

--jar s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar \

--args s3://path/to/emr-mailer/send-report.rb \

--args -n,report_${MONTH} --args -s,"my awesome report ${MONTH}" \

--args -e,awesome-reports@company.com \

--args -r,s3://path/to/report/results