/reader

Primary LanguageJavaScript

Reader

Reader reads documents and extract important keywords in real-time. Here is an example chart created by Reader. It shows the top 25 most important keywords from Jeff's blog. Reader computes keyword scores just after a document is uploaded to S3 using AWS Lambda.

Example Reader Chart

Technologies

  • TF.IDF (Term Frequency times Inverse Document Frequency) *1
  • AWS Services
    • S3 notification
    • AWS Lambda
    • DynamoDB

Architecture

The architecture leverages AWS managed services. Zero server / EC2 instance required to run the application.

  • Clients send text document to S3
  • S3 notification triggers Lambda function called Reader
  • Reader gets text from S3, calculate TF
  • Reader gets IDF from DynamoDB
  • Reader updates DynamoDB with new IDF
  • Reader extracts important keywords using TFIDF
  • Reader saves Top 25 keywords and stores into DynamoDB
  • Reader-dashboard get keywords from DynamoDB and draw the charts

Reader Architecture

Code

Sample code on Github:

Sample AWS Lambda metrics: AWS Lambda Metrics

*1 IDFi = log2(N/ni). Term exsistance data in other documents is required by IDF calculation, which is not implemented in this sample. The idea is to use DynamoDB to store the data.