SupportAggregationHub

A support aggregation hub for Intuit hands-on interview.

For Care Agents

Send a simple GET request to /myAggregatedHub where is the deployment endpoint.

This project uses AWS cloud infrastructure. For a complete list of resources see Architecture

Create a Step Function using the definition at the step-function.json file Make sure to replace the relevant lambda ARNs (e.g. "Resource": "arn:aws:lambda:<REGION_NAME>:<ACCOUNT_NAME>:function:<FUNCTION_NAME>")
Create a CloudWatch event to call the Step Function from the previous step with the desired interval (e.g. 4 hours)
Create an API Gateway with a resource named myAggregatedHub and a GET method. Point the method to invoke the main function. Make sure to pass an AWS Proxy Event.
Set the following main environment variables:
1. SUPPORT_AGGREGATION_HUB_AWS_PROFILE - The AWS profile with permission nto read the DynamoDB.
2. SUPPORT_AGGREGATION_HUB_AWS_REGION - The AWS region in which DynamoDB is set.

For Main

For other packages:

Run ./gradlew <FUNCTION_NAME>Zip (e.g. ./gradlew mainZip). Note that mainZip is available in the main module and all other scripts are available at the data module.
Upload the created .zip file available at <PATH_TO_MODULE>/build/distributions/<FUNCTION_NAME>-1.0.SNAPSHOT.zip to the corresponding Lambda function.

Cache time-to-live: Change the CACHE_TTL environment variable in the cache Lambda function. (e.g. 900000)
Data updates interval: Change the interval at the CloudWatch event. (e.g. 4 hours)
Interval between aggregations: Change the AGGREGATION_INTERVAL environment variable in the cache Lambda function. (e.g. 1800000)
Maximum requests concurrency: Change the "MaxConcurrency" value of the "Mapper" state in the Step Function. 0 means no limit. (e.g. "MaxConcurrency": 0)

For splitting products duplicate the sterilizer function and set the SUPPORTED_PRODUCTS environment variable to the desired products. (e.g. "RED,GREEN" and "BLUE")
To limit their aggregation intervals duplicate the cache function and set the AGGREGATION_INTERVAL environment variable to the desired value. (e.g. 1800000 and 7200000)
To limit their maximum concurrency duplicate the Step Function and set the "MaxConcurrency" value of the "Mapper" state to the desired value. (e.g. 10 and 20)
For a refresh button, add a button to the HTML and trigger the Step Function.

Since a Lambda function is limited to 15 minutes runtime, all operations, including fetching a page of data, should take less than 15 minutes. This assumption seems reasonable especially when considering the default 15 minutes cache time-to-live.
If a scheduled update is called a short time after an on-demand aggregation the data will not be updated again. This may result in data being up to aggregationInterval + updateInterval old (4.5 hours by default).

The following list covers the system services and their function:

Main - Requests data from the backend and displays the results with a simple HTML.
Cache - Retrieves data from the cache and returns if data is fresh. If not, it calls the aggregation.
CRM Connector - Connects to a provided CRM and fetches data. If possible, support pagination.
Sterilizer - Get a raw data from the CRM and removes irrelevant data, e.g. removes unsupported products.
Filter - Filters the sterilized data by a given parameter and operation. (Not implemented yet)
Mapper - Converts the filtered data to a format that allows aggregation.
Reducer - Aggregates the mapped or partially aggregated data.

DynamoDB is used for caching data.

CloudWatch events are used for scheduling.

A list of required AWS resources: