AWS-SAA-Notes

                *Estimated reading time: 30 minutes*

Personal Notes for preparing for AWS Solution Architect Associate Certifications

Global Infrastructure

  • Regions
    • us-east-1, eu-west-3
    • has min 2, max 6, usually 3 AZ
    • cluster of data centers
    • choose region based on Complianced, Promixity, Available services, Pricing
  • Availability Zones
    • e.g. ap-southeast-2a, ap-southeast-2b
    • one or more discrete data centers - reduntant power, networking, and connectivity
    • isolated from disasters
    • connected with high bandwidth, ultra-low latency networking
  • Edge Locations
    • CloudFront uses these locations to cache copies of content for faster delivery to users at any location

IAM

  • Identity and Access Management - Global service
  • Root account created by default
  • Users - physical person
  • Groups - contains only users, not other groups
  • Roles - used when AWS service needs to perform activity on our behalf.
  • users may or may not belong to one or more groups
  • Policies
    • defines permissions in JSON doc
    • least privilege principle
    • can be applied to User or Group or Role
  • can set Password Policy - Strong policy - min pass length, upper case, numbers, non-alphanumeric - prevent pass reuse - expiration policy
  • MFA
  • We can access AWS using
    • AWS Management Console
    • AWS Command Line Interface (CLI)
    • AWS Software Development Kit (SDK)
    • Access Key ID, Secret Access Key for SDK and CLI
    • Username and Password for Console
  • IAM security tools
    • IAM Credential Report - list acc users and status of their credentials
    • IAM Access Advisor - shows permissions granted to a user and when those were last accessed.

EC2

  • Elastic Cloud Compute
  • configuration options -
    • OS
    • CPU
    • RAM
    • Storage - EBS, EFS, Instance Store
    • Firewall rules - using Security Groups
    • EC2 User Data
      • contains commands to run when an instances starts for the first time.
  • EC2 Instance Types
    • General Purpose - balanced Compute, memory and networking
    • Compute Optimized - high performance processors
    • Memory Optimized - process large data in memory
    • Storage Optimized - lots of read and write access to large data on local storage
  • Security Groups
    • Controls traffic into or out of EC2 Instances
    • contains only allow rules
    • regulate on the basis of :
      • Port
      • IP ranges
      • Other Security Groups
      • Inbound traffic - blocked by default
      • outbound traffic - authorized by default
    • Can be attached to multiple instances
    • locked in VP/region combo
    • Application gives time out in case of SG issue
    • Application give connection refused error, in case of application error or it is not launched.
  • Purchasing Options:
    • On-demand Instances
      • short and uninterrupted workload
      • per second billing for Linux, per hour billing for others
      • highest cost - no upfront payment - no longterm commitment
    • On-Demand Capacity Reservations
      • reserve compute capacity for EC2 instances in a specific AZ for any duration
      • No 1 or 3 year commitment
      • Billing starts as soon as the capacity is provisioned
      • Cancel the Capacity Reservation to stop incurring charges
      • Specify
        • AZ
        • Number of instances
        • attributes like instance type, platform/OS
      • No billing discount.
    • Reserved
      • long workloads - period 1 or 3 years
      • 72 % discount
      • no upfront or partial upfront or all upfront payment
      • Convertible - flexibility of changing instance types
      • Scheduled - launch within time window
    • Spot instances
      • short workload, cheap, less reliable
      • use for failure resilient tasks
      • Not suitable for critical
      • define max spot price and get instance when current price < max
      • if current price > max, instance is stopped or terminated - 2 min grace time
      • Spot Block - block insatnce during specific time frame
      • Spot request is defined with
        • max price
        • desired number of instances
        • type - One time or persistent
        • Valid from and to
      • only open, active, or disabled requests can be cancelled
      • First cancel Spot request and then terminate associated Spot instances
      • Spot Fleet
        • automatically request spot instances within cost and capacity
        • Strategies
          • lowest price - lowest price pool
          • diversified - across different pools
          • capacity optimized - pools with good capacity
    • Dedicated Hosts
      • book entire physical server
      • expensive
      • useful for compliance needs or complicated licences
    • Dedicated Instances
      • hardware is not shared with other customers
      • instances of the same account are on the same hardware
      • No control of which hardware will be used. Hardware may change when instance is stopped and started
  • Placement groups
    • Control over where the EC2 instances are placed
    • Cluster - low-latency group
    • Spread - high available - spread across different hardware - max 7 instance per group per AZ
    • Partition - instances in partition - all instances in partition in same hardware - 7 partition per AZ
  • An EC2 instance has Public and Private IP
    • Public IP changes after every restart
    • Private IP stays constant
  • Elastic IP
    • fixed public IP
    • can attach it to one instance at a time
    • can be quickyly remapped to another instance
    • the mapped instance will accessible using the elastic IP
  • Elastic Network Interface
    • has Primary Private IP , one or more secondary IP
    • One Elastic IP
    • One Public IP
    • 1 or more security groups
    • A MAC address
    • Can be created independently and attached on EC2 instnaces
    • Bound to AZ
  • EC2 Hibernate
    • RAM is preserved - written to a file in the root EBS
    • boots faster
    • RAM size must be less than 150 GB
    • Root volume must be encrypted
    • For On-demand and Reserved Instances
    • cannot be hibernated more than 60 days
  • Nitro
    • better performance
    • better netwroking options
    • Higher speed EBS - 64000 EBS IOPS
    • better security
  • In EC2, we can change number of CPU cores (decrease licensing costs), or decrease number of threads

EBS

  • Elastic Block Store - network drive - used by EC2 instance
  • allows instance to persist data after termination
  • bound to one AZ
    • To move, need to create Snapshot
  • might be a bit of latency
  • can be attached and detached easily
  • Provisioned capacity
    • can increase capacity over time
  • We can delete EBS on termination Snapshot
    • to create backup
    • stop EC2 before creating snapshot
    • AMI
      • Amazon Machine Image
      • can add your own software, conf, operating system, monitoring
      • has faster boot/configuration time
      • Built for specific region
      • can be Public/Own AMI/Marketplace
      • create AMI from EBS snapsht
      • Volume types
        • gp2/gp3 - general purpose - 1GB to 16TB - max IOPS 1600
        • io1/io2 - low-latency - max IOPS - 32000 for other and 64000 for nitro, multi-attach
        • st1 - low cost HDD, frequently accessed, throughput -intensive , max IOP - 500
        • sc1 - infrequently accessed - max IOP - 250
    • Encryption
      • encrypted at rest and in transit
      • transparent - Key from KMS
      • Snapshots of encrypted EBS is encrypted
      • To create an encrypted EBS from unencrypted
      • create the encrypted snapshot
      • create EBS vol from snapshot
      • attach the EBS vol to the original instance
    • RAID Options
      • RAID 0
        • increases performances
        • combining multiple volume - getting total space and IO
        • not fault tolerant
      • RAID 1
        • increases falt tolerance
        • mirroring a volume to another
      • RAID 5, 6 not supported

Instance store

  • better I/O performance
  • ephemeral - data lost when instance stopped
  • Good for buffer/cache/ scratch data
  • Risk of data loss if hardware fails
  • Backups and Replication - our responsibility

EFS

  • Elastic file system
  • network file system
  • works in multiple AZs - EC2 in different AZ can access a single EFS
  • Highly available, scales autamically , expensive, pay per use
  • use case - content management
  • uses NFSv4.1 protocol
  • Compatible with Linux, not Windows
  • Encryption at rest using KMS
  • POSIX file system
  • Storage classes
    • Performance mode
      • general purpose
      • Max I/O
    • Throughput mode
      • Bursting (1 TB = 50MiB/s + burst of up to 100MiB/s)
      • Provisioned - set throughput regardless of size
    • Storage Tiers
      • Standard: for frequently accessed file
      • Infrequent access (EFS-IA): cost to retrieve files, lower price to store

Load Balancer

  • why?

    • Spread load across multiple downstream instances
    • Expose a single point of access (DNS) to your application
    • handle failures of downstream instances
    • healthchecks
    • SSL endpoint
    • enforce stickiness
      • Classic and Application load balancers
      • helps in maintaining session data - redirecting the same client to the same instance
      • uses cookie - cookie has expiration date
    • Separate public traffic from private
  • Cross Zone load Balancing

    • when enabled - distributes evenly across instances in all AZ
  • SSL Certs

    • needed for inflight encryption
    • can be managed using ACM (AWS Certificate Manager) Or upload own certs
    • can use Server Name Indication to handle multiple certs for multiple websites - client needs to indicate hostname
  • Connection Draining

    • Time to complete in-flight requests while the instance is deregistering or unhealthy
    • Stops sending new requests to the instance which is de-registering
    • 0 to 3600 secs, (default 300) (0 Disabled)

Classic Load Balancer

  • HTTP/HTTPS, TCP
  • Fixed hostname XXX.region.elb.amazonaws.com
  • Classic Load Balancer needed per application
  • Cross Zone load Balancing
    • through Console - Enabled by default
    • through CLI / API - Disabled by default
    • No charges for inter AZ data if enabled
  • Support only one SSL certificate
    • Must use multiple CLB for multiple hostname with multiple SSL certificates
  • Connection draining available

Application load balancer

  • HTTP
  • Fixed hostname XXX.region.elb.amazonaws.com
  • load balancing - multiple machines or multiple apps on same machine
  • supports HTTP, WebSocket and redirects (like from HTTP to HTTPS)
  • Routing table - contains routing rules - rules can be based on
    • path in URL
    • Hostname in URL
    • Query String, Headers
  • Targets for Routing can be
    • EC2 instances
    • ECS tasks
    • Lambda Functions
    • IP Addresses
  • Health checks at target group level
  • Cross Zone load Balancing - Always On
  • SSL Cert - Supports multiple listeners with multiple SSL certificates
    • Uses Server Name Indication (SNI) to make it work
  • Connection draining is called Deregistration Delay
  • app can see the true details of client in header:
    Item Field
    IP X-Forwarded-For
    Port X-Forwarded-Port
    proto X-Forwarded-Proto

Network Load Balancer

  • Forward TCP & UDP
  • millions records/sec
  • less latency
  • one static IP / AZ
  • extreme performance
  • Cross Zone load Balancing - Disabled by default - if enabled, charges for inter AZ
  • SSL Cert - Supports multiple listeners with multiple SSL certificates
    • Uses Server Name Indication (SNI) to make it work
  • Connection draining is called Deregistration Delay

ASG

  • Auto Scaling Groups

  • Scale out for increased load, Scale in for decreased load - num of instances is within min and max defined

  • Automatically registers new instances to a load balancer

  • Has attributes:

    • Defined in a lainch configuration
      • AMI + Instance Type
      • EC2 User Data
      • EBS Volumes
      • Security Group
      • SSH Key Pair
    • Min/Max/initial Capacity
    • Nw + subnet info
    • Load Balancer info
    • Scaling policy
  • Scaling Metric can be

    • CloudWatch alarms can be used to scale
      • alarm to monitor metrics for all ASG instances like Avrg CPU
      • based on alarm - we can increase or decrease the # of instances
    • Scale using EC2 managed rules
      • Target Avg CPU Usage
      • #of requests on the ELB per instance
      • Avg Nw In/Out
    • Custom Metric
      • can create custom metric
      • send metric from app on EC2 to CloudWath PutMetric API
      • create Cloudwatch alarm to react to low/high values
      • use CW alarm as the scaling policy
    • Metric can be based on schedule
  • To update ASG, prve new launch configuration/launch template

  • IAM roles on ASG will be assigned to EC2

  • can terminate instances marked as unhealthy by LB

  • Scaling Policies:

    • Target Tracking - e.g. avg CPU to stay around 40%
    • Simple/step Scaling - e.g. When CW alarm is triggered, add 2 units or remove 1 units
    • Scheduled Actions - e.g increase the min cap to 10 at 5pm on Fridays
  • Scaling Cooldown

    • ensures ASG does not launch or terminate addn. instances before the prev scaling activity takes effect.
    • can have scaling specific cooldown period
  • ASG balances #of instances across AZ

    • Default termination Policy - find AZ with most number of instances - delete the one with the oldest launch configuration
  • Lifecycle Hooks

    • By Default, launched instance is in service
    • Can perform steps before the instance goes in service
    • Can perform steps before the instance is terminated
Launch Configuration Launch Template
must be created everytime multiple version supported
Create parameter subset - partial configs for re-use or inheritance
Provision using both On-Demand and Spot Instances
Can use T2 unlimited burst feature

RDS

  • Relational Database Service
  • Supports:
    • Postgres
    • MySQL
    • MariaDB
    • Oracle
    • MS SQL Server
    • Aurora (AWS Propriety database)

WHY Use?

  • Automated provisioning, OS patching

  • supports continuous Backups - Point in time restore

    • automatically enabled
    • daily during maintenance window
    • transaction logs backed up every 5 minutes
    • 7 days retention (can be max 35 days)
  • DB snapshots

    • manually triggered
    • retention as long as you want
  • monitoring dashboards

  • read replicas for improved read performance

    • up to 5 replicas
    • within AZ, Cross AZ, or Cross Region
    • Async
    • Replica can be promoted to DB status
    • Applications must update connection string to leverage read replicas
    • For SELECT only
    • Netwrk cost when data goes from one region to another - no fee for within region
  • Disaster recovery - multi AZ setup

    • SYNC replication
    • One DNS namme - automatic failover to standby server
    • increase availability
    • no manual intervention
    • not for scaling
    • Multi-AZ free
    • Multi AZ keeps the same connection string regardless of which database is up. Read Replicas imply we need to reference them individually in our application as each read replica will have its own DNS name
  • Vertical and horizontal Scaling

    • dynamically increased - automatically scaled
    • have to set Maximum Storage Threshold (maximum limit for DB storage)
    • useful for unpredictable workloads
  • gp2 or io1 EBS backed

  • Can convert Single-AZ RDS to Multi-AZ with zero downtime

    • a snapshot is taken
    • a new DB is retired from the snapshot in a new AZ
    • Synchronization is established between the two DBs
  • At rest Encryption

    • possible with AWS KMS - AES 256 encryption
    • has to be defined at launch time
    • if master is not encrypted, the read replicas cannot be encrypted
    • TDE for Oracle and SQL server
  • In flight encryption

    • SSL cert
  • Encrypt RDS backups - copy snapshot and enable encypted

  • Encrypt RDS DB

    • create snapshot
    • copy snapshot and enable encryption for the snapshot
    • restore DB from encypted snapshot
    • Move apps to new DB - delete old DB
  • Security

    • Network Security
      • deployed in pvt subnet
      • leverages security groups
    • Access Management
      • IAM policies - control who manages AWS RDS
      • can have tradional Usrname/Password OR IAM-based authentication to login into DB (IAM auth works for MySQL and PostgreSQL)

Aurora

  • Postgres and MySQL supported
  • can have 15 replicas - sub 10 ms replica lag
  • Automatic Failover
  • costs more than RDS - more efficient
  • pay as you go
  • 6 copies across 3 AZ
    • needs 4 for writes
    • needs 3 for reads
    • self haling with p2p replication
    • 100 of volumes
  • One master takes writes
  • Automated failover for master in less than 30 seconds - a read replica is promoted to be master
  • Master + up to 15 Aurora Read Replicas serve reads
  • Support for Cross Region Replication
    • useful for disaster recovery
  • Backup and Recovery
  • Isolation and security
    • Encyption at rest using KMS
    • Encryption in flight using SSL
    • can authenticate using IAM token
    • can't SSH
  • Industry compliance
  • Auto scaling
    • storage automatically grows in increments of 10GB, up to 64TB
  • Automated Patching with Zero Downtime
  • Advanced Monitoring
  • Routine Maintenance
  • Backtrack - restore data to any point of time without using backups
  • Types of endpoint:
    • Cluster endpoint - supports writing/reading
    • Reader endpoint - supports reading from one of the the Read Replicas - acts as a load balancer
    • Custom endpoint - a set of DB instances that you choose - load balancer
  • Serverless
    • pay per second
    • Automated database instantiation and auto-scaling based on actual usage
    • for infrequent, unpredictable workloads
    • No capacity planning needed
  • Aurora Multi-master
    • for immediate failover
    • every node performs R/W
  • Global Database
    • 1 Primary Region (read/write)
    • upto 5 read-only region
    • upto 16 Read Replica per secondary region

Elasticache

  • Redis and Memcached
  • Caches - high performance, low latency
  • fo read intensive workloads
  • AWS takes care of OS maintenance/patching, optimizations, setup, confiuration, monitoring, failure recovery and backups
  • Apps queries ElastiCache - if not present, get from RDS and store in ElastiCache
  • Cache must have an invalidation strategy to make sure most current data is used
  • Redis vs Elasticache
    Redis Memcached
    Read Replicas - High Availability No High availability
    Perstience of data Non persistent
    Backup and restore No Backup and restore
    Uses single core Multi-threaded architecture
  • REDIS sorted sets - guarantees uniqueness and element ordering
    • each time a new element is added, it is ranked in real time, then added in correct order
  • Security
    • Does not support IAM authentication - any operation within cache is not using IAM
    • IAM used only for AWS API-level security - for deleting the cache
    • REDIS
      • can set password when creating cluster
      • in-flight encryption using SSL
    • Memcached
      • Supports SASL-based authentication
  • Caching Strategies
    • Lazy Loading - loads data into the cache only when necessary
    • Write Through - y adds data or updates data in the cache whenever data is written to the database
    • TTL - specifies the number of seconds until the key expires

Route 53

  • Domain Name System

  • Global service

  • helps client understand how to reach a server through URLs using collection of rules and records

    • A: hostname to IPV4
    • AAAA: hostname to IPV6
    • CNAME: hostname to hostname
    • Alias: hostname to AWS resource
  • can use public domain names you own or private domain names that can be resolved in your VPC

  • Load balancing

  • Health check

  • Routing policy

    • Simple Routing Policy
      • Maps a hostname to another hostname
      • Use when you need to redirect to a single resource
      • can't attach health checks
    • Weighted Routing Policy
      • Control the % of the requests that go to sprcific endpoint
      • Can be associated with Health Checks
    • Latency Routing Policy
      • Redirect to the server that has the least latency close to us
    • Failover Routing Policy
      • route traffic to a resource when the resource is healthy or to a different resource when the first resource is unhealthy.
    • Geo Location Routing Policy
      • routing based on user location
      • should create a default policy
    • Geo Proximity Routing Policy
      • Route traffic to your resources based on the geographic location of users and resources
      • Can choose to route more traffic or less to a given resource by specifying a value, known as a bias
      • To use geoproximity routing, you must use Route 53 traffic flow
    • Multi-value Routing Policy
      • Use when routing traffic to multiple resources
      • Associate health checks with the records
      • up to 8 healthy records are returned
  • Can use Route 53 with domain bought from a 3rd party

    • Create a hosted zone in Route 53
    • Update NS Records on 3rd party website to use Route 53 name servers
  • CNAME vs ALIAS

    • CNAME ALIAS
      Points a hostname to another hostname Points a hostname to another AWS resource
      Only for NON-ROOT domain Works for ROOT and NON-ROOT
      Charged Free of charge
      No native health check Native Health Check
  • Health Checks

    • Have X health checks failed => unhealthy (default 3)
    • After X health checks passed => health (default 3)
    • Default Health Check Interval: 30s (can set to 10s – higher cost)
    • About 15 health checkers will check the endpoint health
      • one request every 2 seconds on average
    • Can have HTTP, TCP and HTTPS health checks (no SSL verification)
    • Possibility of integrating the health check with CloudWatch
    • Health checks can be linked to Route53 DNS queries!
    • In health check, the resource is pinged from health checkers in different locations.

Elastic Beanstalk

S3

  • store objects(files) in buckets(like directories).

  • Buckets

    • globally unique name
      • no uppercase, or underscore, or IP
      • 3-63 characters long
      • start with a lowercase of number
    • defined at region level
  • Object have

    • Key - prefix+object name
    • value are the content of the Object body
      • max size 5TB
      • for uploading more than 5GB, use multi-part upload
    • Metadata
    • Tags - useful for security/lifecycle
    • Version ID
  • Versioning of files is enabled at bucket level

    • Any file that is not versioned prior to enabling versioning will have version “null”
    • Suspending versioning does not delete the prev versions
  • Objects can be encrypted by using:

    • SSE-S3 - encryption usig keys handled and managed by AWS
      • Object encrypted server side
      • AES-256
      • Must set header "x-amz-server-side-encryption": "AES256"
    • SSE-KMS - using KMS to manage keys
      • user control + audit trail
      • encrypted server side
      • Must set header "x-amz-server-side-encryption": ”aws:kms"
    • SSE-C - you manage keys
      • server side encryption
      • S3 does not store the encryption key you provide
      • HTTPS mandatory
      • Encryption key to be provided in headers
    • CLient side encryption
      • Clients must encrypt data themselves before sending, and decrypt data themselves when retrieving
      • uses AmazonS3EncryptionClient
  • S3 exposes HTTP and HTTPS endpoints

  • S3 access can be defined using:

    • IAM policies
    • Bucket policies
      • JSON based policies
      • Allow/Deny a Principal on a Resource
    • Object Access Control List
    • Bucket Access Control List
  • Supports VPC Endpoints

  • Access Logs can be stored in other S3 buket

  • API call can be logged in AWS CLoudTrail

  • User Security

    • MFA Delete - can be required in versioned bucket to delete objects
    • Pre-Signed URLs: URLs that are valid only for a limited time
  • S3 can host static websites

    • <bucket-name>.s3-website-<AWS-region>.amazonaws.com OR <bucket-name>.s3-website.<AWS-region>.amazonaws.com
  • S3 CORS

    • Cross-Origin Resource Sharing
    • An origin is a scheme(protocol), host(domain) and port
    • Web Browser based mechanism to allow requests to other origins while visiting the main origin
    • The requests won't be fulfilled unless the other origin allows for the requests, using CORS Headers (Access-Control-Allow-Origin)
    • If a client does a cross-origin request on our S3 bucket, we need to enable the correct CORS headers
    • can allow for a specific origin or for * (all origins)
  • Consistency Model

    • read after write consistency
    • list consistency
  • MFA Delete

    • when versioning enabled
    • for permanently deleting object version
    • for suspending versioning
    • only bucket owner can enable/diable - enable from CLI only
  • Acess Logs

    • can log all access to S3 buckets
    • will be logged into another S3 bucket
  • Replication

    • must enable versioning in source and destination
    • Cross Region Replication
    • Same Region Replication
    • Buckets can be in different accounts
    • Copying is asynchronous
    • Must give proper IAM permissions to S3
    • After activation only new objects are replicated
    • Delete operations not replicated - with version, a delete marker is added
    • No chaining of replications
  • Pre-Signed URLs

    • Can generate pre-signed URLs using SDK or CLI
    • Default 3600 seconds validity, can be set using TIME_BY_SECONDS argument
    • with presigned URL, user inherits the permissions of the creator
  • S3 Storage classes

    • S3 Standard - General Purpose
      • High durability across multiple AZ
      • 99.99% Availability
      • Sustain 2 concurrent faciity failures
    • S3 Standard - Infrequent Access
      • when data is less frequently accessed, but requires rapid access when needed
      • High durability across multiple AZ
      • 99.9% Availability
      • lower cost than S3 Standard
      • Sustain 2 concurrent facility failures
      • Retrival fee per GB retrieved
    • S3 One-Zone - Infrequest Access
      • same as IA, but stored in single AZ
      • High durability - but data lost when AZ is destroyed
      • 99.5% Availability
      • Low latency and high throughput performance
      • Low cost compared to IA
      • Retrieval fee per GB retrieved
    • S3 Intelligent Tiering
      • low latency and high throughput same as S3 Standard
      • small monthly monitoring and auto-tiering fee
      • automatically moves objects between two access tiers based on changing access patterns
      • High durability
      • Retrival fee per GB retrieved
    • Glacier
      • low cost storage - archiving and backup
      • data retained for longer term
      • each item is called Archive
      • Archives are stored in Vaults
      • Minimum storage duration of 90 days
      • Retrival fee per GB retrieved
      • Retrieval options
        • Expedited (1 to 5 minutes)
        • Standard (3 to 5 hours)
        • Bulk (5 to 12 hours)
    • Glacier Deep Archive
      • for long term storage - cheaper
      • min storage of 180 days
      • Retrieval options
        • Standard (12 Hours)
        • Bulk (48 hours)
      • Retrival fee per GB retrieved
  • Moving objects between different storage tiers can be automated using lifecycle configuration

    • Transition actions - defines when objects are transitioned to another storage class
    • Expiration actions - configure objects to expire(delete) after some time
    • Rules can be created for certain prefix (s3://mybucket/mp3*) or object tags (Department:Finance)
  • S3 Analytics

    • Storage class analytics to help determine when to transition objects
    • does not work for One zone or Glacier
    • daily report
  • S3 - KMS limitation

    • All key upload and download activity is counted towards the KMS quota per second - quota can not be increased
  • Multi Part upload

    • for files > 100MB, must for > 5GB
    • helps parallelize uploads
  • Transfer Accelaration

    • Increase transfer speed by transferring file to an AWS edge location which will forward the data to the S3 bucket in teh target region
    • Compatible with multi-part upload
  • Byte range fetches

    • Parallelize GETs by requesting specific byte ranges
    • better resilience in case of failures
    • can be used to retrieve only partial data
  • S3 Select & Glacier Select

    • Retrieve less data using SQL - server side filtering
    • less network transfer, less CPU cost client-side
  • In case of Requester Pays buckets, the requester instead of bucket owner pays the cost of the request

    • the requester must be authenticated in AWS
  • Glacier Vault Lock

    • Adopt a Write Once Read Many model
    • lock the policy for future edits
    • helpful for compliance and data retention
  • S3 object lock

    • versioning must be enabled
    • block an object version deletion for a specified amount of time
    • Object retention - Retention period(fixed period) or Legal Hold(no fixed period)
    • Modes
      • Governance mode - can't overwrite or delete object version or alter lock permissions unless they have special permissions
      • Compliance mode - object version cannot be overwritten or deleted by any user, including root user.
        • retention mode can't be changed - retention period can't be shortened

Athena

  • Serverless service to perform analysis directly against S3 files
  • Uses SQL language to query the files
  • JDBC/ODBC driver
  • charged per query and amount of data scanned
  • supports CSV, JSON, ORC, Avro, and Parquet

CloudFront

  • Content Delivery Network
  • Improves read performance - cached at edge
  • DDoS protection, integration with Shield (AWS Web Application Firewall)
  • can expose HTTPS and can talk to internal HTTPS backends
  • Cloudfront Origins
    • S3 Bucket
      • For distributing files and caching them at the edge
      • Enhanced security with CloudFront Origin Access Identity (OAI)
      • can be used as an ingress (to upload files to S3)
    • Custom Origin (HTTP)
      • Application Load Balancer
      • EC2 instance
      • S3 website
      • Any HTTP backend you want
    • Geo Restriction
      • Whitelist - Allow your users to access your content
      • Blacklist - Prevent users from accessing content
        • the country is determined using 3rd party Geo-IP database
        • Use case: Copyright laws to control access to content
    • Cloudfront vs S3 Cross Region Replication
      • Cloudfront
        • Global edge network
        • Files are cached for a TTL
        • Great for static content that must be available everywhere
      • S3 Cross Region Replication
        • Must be setup for each region you want replication to happen
        • Files are updated in near real-time
        • Read Only
        • Great for dynamic content that needs to be available at low-latency in few regions.
    • CloudFront Signed URL/Signed Cookies
      • distribute paid shared content to premium users over the world
      • attach a policy with:
        • URL expiration
        • IP ranges to access the data from
        • Trusted signers (which AWS accounts can create signed URLs)
      • signed urls = access to individual files
      • signed cookies = access to multiple files
    • CloudFront signed URL vs S3 Pre-Signed URL
      • Cloudfront
        • ALlow access to a path, no matter the origin
        • Account wide key-pair, only the root can manage it
        • Can filter by IP, path, date, expiration
        • Can leverage caching features
      • S3
        • Issue a request as the person who pre-signed the URL
        • Uses the IAM key of the signing IAM principal
        • Limited Lifetime
    • Price Classes
      • Price Class All - all regions - best performance
      • Price Class 200 - most regions, but excludes the most expensive regions
      • Price Class 100 - only the least expensive regions
    • CloudFront - Multiple Origin
      • To route to different kind of origins based on the content type or path pattern
    • Origin Groups
      • One primary and one secondary
      • if the primary fails, then the secondary one is used
      • increases high-availability - enables failover
    • Field Level Encryption
      • Protect user sensitive information through application stack
      • Adds an additional layer of security along with HTTPS
      • Sensitive information is encrypted at edge - close to the user
      • Assymetric encryption
      • The number of fields are specified in the header - max 10 fields
      • the public key is also placed in the header

AWS Global Accelerator

  • Leverage the AWS internal network to route to your application
  • 2 Anycast IP - IP sends traffic directly to Edge Locations - send traffic to your application

Snow

Storage Gateway

Amazon FSx for Windows

Amazon FSx for Lustre

SQS

SNS

Kinesis

MQ

ECS

Fargate

ECR

EKS

Lambda

DynamoDB

API Gateway

Cognito

SAM

CloudFront