Contents
General
-
Cache external dependencies locally to reduce network transfer costs. For example - pull-through docker image registries.
-
Always prefer spot instances where stateless and possible as opposed to on-demand.
-
Use automated scaling solutions to power off dev workloads during weekends if possible.
-
Filter your logs, metrics and traces before they reach your monitoring solution. In almost all solutions, SaaS or not, you're being charged for their storage or ingestion. Fl
GCP
Compute
-
Don't forget to specify min_cpu_platform for platforms that support it. You will pay the same price for faster instances (e.g. specify AMD Milan for n2d instance, or Intel Ice Lake for n2).
-
While their AMD Milan instances currently offer the best performance and performance for price in general, for most regions they give you the same CPU reservation price for t2d as they do for n2d. This is significant because the t2d gives you TWICE the actual CPU cores, as it is 1core = 1vCPU (no HT), where all other AMD/Intel types are 1HT = 1vCPU.
AWS
Compute
-
Use Reserved Instances and Savings Plans. Consider using "smart" automated RI SaaS solutions which are based on your existing workloads.
-
Prefer higher generation EC2 instances, they will always be cheaper. It is also true for other products such as storage solutions like gp2 as opposed to gp3.
-
The Graviton3-based instances are the best multi-threaded performance/price across their general/compute offerings.
Networking
-
Move away from Classic load balancers as they are deprecated for EC2-Classic networks and cost more*, use Network or Application load balancers instead.
- *CLBs do not incur cost for TLS negociations and for established connections. It's not always cheaper to do NLBs or ALBs
-
Most likely, move away from VPC peering to Transit Gateways (or Network Manager) and VPC Sharing. Peering is costlier when there are many VPCs. Take the time to calculate your usage and network traffic.
-
Use VPC Endpoints when your workloads access AWS services from within AWS. Check however if the cost of the endpoint isn't higher than the cost of your usage without it.
-
When using multiple private subnets that access the internet, make sure they each have a NAT gateway. It will usually cost more to send the traffic only through one of them on a heavy networking load. Alternatively, setup a NAT on a small instance yourself, or use only public subnets and block all external access - whichever fits your budget and use-case the best.
-
Self Hosted NAT Gateway Tip
If you're on a shoestring budget and internet access from your private subnets doesn't absolutely require 100% uptime, you can use a `t3a.nano` as a NAT instance instead of using NAT gateways, which are quite expensive per-subnet-month.
-
S3
-
Use S3 object classes to majorly reduce costs on less frequently accessed buckets.
-
If using S3 Glacier, compress your files into as few as possible before uploading in order to save requests cost.
Kubernetes
-
Prefer to use the ingress controller instead of LoadBalancer for exposing services, so they will use only one load balancer and be exposed through only the ingress.
-
Consolidate your pods on less nodes. Leave only as little headroom as you intend for in your nodes.
-
Don't over commit resources. Pod requests must be optimized over time in order to not over provision.
-
If possible, prefer using only a single region to avoid network transfer costs between nodes. Preferably when it's not production.
-
Single Region Highly Available AWS EKS Karpenter Trick
This dual-provisioner configuration allows Karpenter to softly always prefer scheduling on a single AZ, unless it is unavailable. In this scenario, it will move to another AZ until the former AZ works.
# Provisioner A - providerName: main-node weight: 100 requirements: - key: "topology.kubernetes.io/zone" operator: In values: ["us-east-1a", "eu-west-1a"] --- # Provisioner B - providerName: backup-node weight: 0 requirements: - key: "topology.kubernetes.io/zone" operator: NotIn values: ["us-east-1a", "eu-west-1a"]
-
Datadog
APM
- In order to reduce APM costs, consolidate identical workloads on the same nodes where possible.
-
K8s App Consolidation Trick
This configuration makes controllers prefer scheduling in the same nodes at all times. It can reduce the amount of APM hosts that are being billed in Datadog
podAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: app operator: In values: - myapp topologyKey: kubernetes.io/hostname
-
Contributing
Contributions of any kind welcome, just follow the guidelines!