This Terraform module simplify setup of the logging collecting and aggregation using Vector to the AWS EKS with intermediate cache in AWS S3 and final destination in AWS OpenSearch
HCLApache-2.0
Documentation
Description and Architecture
This module was created to simplify configuring logging collecting and aggregation using Vector with intermediate cache in AWS S3 and final destination in AWS OpenSearch (former ElasticSearch).
In the above diagram, you can see the components and their relations.
Resources you need to provide (pre-requisitions)
OpenSearch cluster where Vector Aggregator will ingest processed logs.
Secrets Manager Secret with vector_username and vector_password, which will be used as Basic Auth credentials to OpenSearch cluster. This is optional, if not provided Vector Aggregator will use IAM Role for put data into OpenSearch. If you accessing OpenSearch cluster via VPN Endpoint make sure you provided appropriate host name via the Header. See example in examples/iam-auth-opensearch.
EKS cluster where Vector Agent and Vector Aggregator will be deployed.
Resources created by the module
Two components will be installed into the existing EKS cluster using the Helm chart:
Vector Agent, as DaemonSet, collects logs and metrics from the node (including application logs) and sends them to the S3 bucket using the ServiceAccount and connected IAM Role. It has been done to decouple logs collecting as far as our OpenSearch cluster can be in the maintenance or overloaded, but we don't want to miss any logs. S3 bucket is cheap and reliable storage, we are using it as a buffer.
Vector Aggregator, as StatefullSet, which reads the SQS queue for new logs, gets them from the S3 bucket, processes, enriches, and sends them to the OpenSearch cluster.
S3 bucket has lifecycle policy and stores logs for a period of time (7 days by default). It has notifications configured on CreateObject event routed to the SQS queue. Resource policy restricts access to the bucket only to Vector Agent IAM Role and Vector Aggregator IAM Role.
SQS queue, as destination for S3 CreateObject notifications used by the Vector Aggregator to get information about logs messages that has to be processed.
Vector Agent IAM Role and Vector Aggregator IAM Role created to provide granular access to the AWS resources.
Explanation of some architectural decisions
The S3 bucket is used as temporary storage here to not lose any logs in a case when the OpenSearch cluster is in the "Red health status". Messages wait in the SQS queue for later processing. When logs are ingested into the OpenSearch cluster, we remove the message from the SQS queue.
When we have multiple AWS accounts (one per environment), we are using a single OpenSearch cluster. We believe that the Vector Aggregator must be part of each cluster where logs are generated so we can test changes of the Vector Aggregator configuration in the lower environments before promoting them to the Production.
By default agent template has following variables: region, bucket, and cluster_name. Module replaces them inside automatically. If you defined additional variables in the template provided via agent_template_filename you need to provide values for them here.
By default aggregator template has following variables: queue_url, endpoint, region, and eks_cluster_name. Module replaces them inside automatically. If you defined additional variables in the template provided via aggregator_template_filename you need to provide values for them here.
By default we are creating the least priviledgies S3 bucket policy (limited access only for Agent IAM Role and Aggregator IAM Role). You can override it by providing S3 bucket policy JSON document here.