/email-app-content-recognition

Create in seconds an AWS pipeline to read email attachments (pdfs, images), read/classify its content as text. Scales horizontally automatically.

Primary LanguagePython

Email App Content Recognition

This AWS Serverless Application was originally created to automatically read N number of emails (eml format) from an s3 bucket, download all attachments into another s3 bucket. These binary attachments (PDFs or images) need to be stored for own access. The Attachments then are processed and its content extracted through ML models https://aws.amazon.com/textract/faqs/. Due to the nature of this Serverless Architecture, this application has low costs and scales automatically.

See Architecture Design here

Dependencies

  • cdk
  • python
  • aws-cli

Setup

virtualenv .env && source .env/bin/activate && \
    python -m pip install -r requirements.aws.txt && \
    python -m pip install -r requirements.app.development.txt && \
    python -m pip install -r requirements.app.txt

Deploy

exports AWS_ACCOUNT_ID=<UPDATE> 
exports AWS_DEFAULT_REGION=<UPDATE> 
exports EMAILS_S3_BUCKET=<UPDATE> 
exports CONTENT_S3_BUCKET=<UPDATE>

npm run deploy