AWS Solutions Certified Architect Associate Course by Stephane Maarek
Section 02: Getting Started with AWS
AWS is the biggest Cloud Provider
AWS Regions
cluster of data centers
Most AWS Services are region-scoped
How to select an AWS Region
Compliance
Proximity
Available services
Pricing
AWS Availability Zones
Minimum of 2 AZ within each AWS Region (usually 3)
Discrete data centers, redundant power, networking, and connectivity
Ultra-low latency networking
AWS Points of Presence (Edge Locations)
IAM Groups only contain IAM Users
IAM Policy ... JSON document
Define permission for users
Least Privilege
Best Practices -- Create an Admin IAM Group (AdministratorAccess) to replace root user for day-to-day activities
IAM Password Policy - behave bitch (cycle passwords)
MFA
Virtual MFA device
U2F Security Key
Hardware Key Fob MFA Device
Hardware Key Fob MFA Device for AWS GovCloud
3 Ways to Access AWS:
AWS Management Console
AWS CLI
AWS SDK
IAM Roles used to assign permission to AWS Services
IAM Credentials Report
account-level, list all account users and credential status
IAM Access Advisor
user-level, shows service permissions on user and when last used (least-privilege)
Section 04: EC2 Fundamentals
AWS Budget - create alarms triggered when budget threshold metrics are exceeded
EC2 - Elastic Compute Cloud - Infrastructure as a Service
Storing data on virtual drives (EBS)
Distribute load using (ELB)
Scaling the services using auto-scaling group (ASG)
EC2 User Data
Script used to bootstrap EC2 instance, only run once, first start
Used to automate boot tasks:
Installing updates
Installing software
Downloading common files from internet
Anything really
Runs as root user
Security Group
set of firewall rules that control traffic for your instance
work on instance-level
contain only allow
rules
Can reference by IP
or by Security Group
SG regulate:
Access to Ports
Authorized IP ranges (IPv4 & IPv6)
Control inbound
traffic to instance
Control outbound
traffic from instance to "other"s
By default
all outbound traffic is allowed
all inbound traffic is blocked
Can be attached to multiple instances
Locked to Region/VPC combination
Request is intercepted before EC2 instance receives it
"time out" -> request not allowed by SG
"connection refused" -> application issue
Classic PORTS to know
22
-> SSH
21
-> FTP
22
-> SFTP
80
-> HTTP
442
-> HTTPS
3389
-> RDP (Remote Desktop Protocol)
Instance Types (7)
General Purpose
Compute Optimized
Memory Optimized
Accelerated Computing
Storage Optimized
Instance Features
Measuring Instance Performance
SSH
ssh -i KEY.pem ec2-user@PUBLIC_IP
chmod 0400 KEY.pem
EC2 Instance Connect
Browser-based terminal to connect to EC2 Instance via AWS Management Console
Attach IAM Roles to EC2 Instances
DON'T RUN "aws configure" within an instance terminal!!!
EC2 instances Purchasing Options
On-Demand
short-term, un-interrupted workloads
Reserved (1 & 3 years)
Reserved Instances (long workloads)
Instance Type
, Region
, Tenancy
, OS
Reserved Instance Scope
- Regional
or Zonal
Convertible Reserved Instances - (long workloads with flexible instances)
Savings Plans (1 & 3 years) - commitments to an amount of usage, long workload
Commit to dollar/hour for 1 & 3 years
Locked to instance family
and region
Spot Instances - short workload, cheap, can be interrupted
Jobs that are resilient to failure:
Batch jobs
Data analysis
Image processing
Distributed workloads
max spot price
, 2 minutes
Dedicated Host - book physical host
compliance requirements
, server-bound software licenses
Dedicated Instances - No other customer will share your hardware
No control over placement of hardware
Capacity Reservations - Reserve capacity in specific AZ for any duration
Reserve On-Demand in a specific AZ
No time commitment, no billing discount
Charge whether you use it or not
short-term
, uninterrupted workloads
in a specific AZ
Spot Fleet
lowest price
- cost optimization, short workloads
diversified
- great for availability, long workloads
capacityOptimized
- optimal capacity
Section 05: EC2 Solutions Architect Level
Elastic IP
a Public IP (IPv4) that can be attached to an instance to retain a fixed IP address
limited to 5 Elastic IP
not a good architecture pattern
in theory, it allows for a failed instance to be remapped as a disaster recovery strategy
ELB is a better approach
EC2 Placement Groups
Cluster - low-latency group in single AZ
10 Gbps network, low-latency, same rack, same AZ, high risk
Spread - max 7 instances/group/AZ - critical applications
minimize failure risk (all instances on different hardware)
span multiple AZ
maximize high availability
Partition - 100s EC2 instances/group (allows Hadoop, Kafka, Cassandra)
Up to 7 partitions per AZ
Multiple AZs in same Region
100s of EC2 instances
Partitions on separate racks
Big Data Applications (HDFS, HBase, Cassandra, Kafka)
Elastic Network Interfaces (ENI)
Virtual Network Card
Can have:
1 Primary (eth0) private IPv4, one or more secondary (eth1) IPv4
1 Public IPv4
1 Elastic IP/private-IPv4
1 or more SG
MAC address
Bound to a specific AZ
Create ENI independently of EC2 instance and attachable on the fly (use case failover)
EC2 Hibernate
Stop, Terminate, Hibernate
EBS root volume must be incremented and EBS volume > RAM size; 60 day limit
Section 06: EC2 Instance Storage
EBS (Elastic Block Store) Volume
network drive - can persist data after termination
multi-attached
to mount EBS onto multiple EC2 instances
bound to specific AZ
EBS snapshots
backup of EBS volume
can copy snapshots onto other AZ/Regions
EBS Snapshot Archive
24 - 72 hours to restore, 75% cheaper
Recycle Bin for EBS Snapshots
To recover EBS Snapshots after accidental deletion
1 day to 1 year
Fast Snapshot Restore (FSR)
expensive but quick; useful for big volumes
AMI (Amazon Machine Image)
customization of an EC2 instance
allows for faster boot (pre-packaged software packages/setup)
Can come from three sources:
Public AMI
Custom AMI (you maintain it)
AWS Marketplace AMI
EC2 Instance Store
High-performance hardware disk
Storage is ephemeral
buffer/cache/scratch data/temporary content
backups and replication are your responsibility
i3
EBS Volume Types (6 Types)
gp2/gp3 (SSD)
- G
eneral p
urpose, balances price/performance - BOOT
io1/io2 (SSD)
- Provisioned IOPS (PIOPS SSD) Highest performance, mission-critical low-lat, high-thru - BOOT - More than 16,000 IOPS, Great for databases - 4GB - 16 TB
MAX 64,000 IOPS for Nitro EC2, otherwise 32,000
io2 more durability and more IOPS per GB wrt io1
io2 Block Express - sub-millisecond latency; MAX IOPS 256,000 IOPS:GB 1,000,1
supports EBS multi-attach
!!!
st1 (HDD)
- Low cost volume, frequent access and high throughput
Max throughput 500 MiB/s to 500
sc1 (HDD)
- Low cost, less frequent access
For gp3
, IOPS and volume are independent; gp2
3 IOPS per GB - linked
EBS Multi-Attach (io1/io2 family
)
Attach multiple EC2 instances in the same AZ
Up to 16 EC2 instances at a time
Must use a File System that is cluster-aware
higher application availability
EBS Encryption
Data at rest is encrypted
In-flight data between instance and volume is encrypted
Snapshots are encrypted
Leverages keys from KMS (AES-256)
Amazon EFS - Elastic File System
Manged NFS can be mounted on many EC2
Works across multi-AZ
Highly available, scalable, expensive (3x gp2), pay per use
uses NFSv4.1 protocol
Compatible with Linux based AMI (not Windows)
POSIX file system
SCALE
1000x concurrent attachments, 10GB+/s throughput
Petabyte scale, automatically
PERFORMANCE
General Purpose (low-latency) or Max I/O (higher latency)
THROUGHPUT
Bursting (1 TB = 50 MB/s + 100 MB/s)
Provisioned - throughput independent of size
Storage Tiers - Standard and Infrequent access (EFS Standard - EFS IA)
Availability - Standard: Multi-AZ or One Zone (EFS One Zone-IA)
EC2 Instance Metadata
http://169.254.169.254/latest/meta-data
can retrieve meta-data
and user-data
Section 07: High Availability and Scalability
Scalability app/system can adapt to increases/decreases in load
Vertical => more power
Horizontal (elasticity) => more servers
Elastic Load Balancer (ELB) - managed load balancer
AWS guarantees it will work, upgrades, maintenance, high availability
Integrated with EC2, EC2 Auto Scaling Groups, Amazon ECS, AWS Certificate Manager, CloudWatch, Route53, AWS WAF, AWS Global Accelerator
Health Checks
port: 4567 and /health
endpoint
4 Types of Load Balancers
Classic Load Balancer (CLB) (DEPRECATED)
HTTP, HTTPS, TCP, SSL (secure TCP)
Application Load Balancer (ALB)
Works on request level
Layer 7 (HTTP/HTTPS)
HTTP, HTTPS, WebSocket
Routing Tables
Can route based on path, hostname, or query strings/headers
Great for micro-services and container-based applications
Port Mapping feature to redirect to dynamic port in ECS
Fixed hostname (xxx.region.elb.amazonaws.com
)
IP of client found in X-Forwarded-For
, X-Forwarded-Port
, X-Forwarded-Proto
Target Groups
EC2 Instances, ECS tasks, Lambda functions (HTTP request to JSON event), IP addresses (private IP), can route to multiple TG, Health Checks at TG level
Network Load Balancer (NLB)
TCP, TLC (secure TCP), UDP
Works on connection level
Layer 4
NLB has one static IP per AZ
, supports Elastic IP
millions of request per second, less latency 100ms (vs 400ms)
Can Redirect to
EC2 instances, private IP addresses (on-premise machines), other ALB
Health Checks on:
TCP, HTTP, HTTPS Protocols
Gateway Load Balancer (GWLB)
Layer 3, IP Protocol
Uses GENEVE
protocol on port 6081
Deploy, scale, and manage a fleet of 3rd party network virtual apps
Firewalls, Intrusion Detection, Deep Packet Inspection, payload manipulation
Works at network level - IP Packets
Combines Transparent Network Gateway = Load Balancer
Targets: EC2 Instances and private IPs
Sticky Sessions
Serial request from a user are routed to same instances by ELB
Can be enabled for ALB
and CLB
Use case: retain session data (login for example)
Two Types of Cookies
Application-based Cookies
Custom cookie
Generated by target (application)
Custom attributes, cookie name per TG
Forbidden: AWSALB
, AWSALBAPP
, AWSALBTG
Application Cookie
Generated by load balancer
AWSALBAPP
Duration-based Cookies
Generated by load balancer
AWSALB
or AWSELB
Cross-Zone Load Balancing
Enabled: All instances share the burden equally regardless of AZ
Enabled by default for ALB (no extra charge for cross AZ data)
Can be disabled at TG level
Disabled by default in NLB and GWLB (charge if you enable)
SSL Certificate
Secure Socket Layer
TLS = Transport Layer Security
in-flight encryption
Certificate Authorities
Comodo, Symantec, GoDaddy, GlobalSign, Digicert, Letsencrypt
Have an expiration date (you set) and must be renewed
LB uses X.509
certificate, managed by ACM (AWS Certificate Manager)
can also create, upload your own certs
HTTPS Listener:
Must specify default cert
optional list of certs to support multiple domains
Client can use SNI (Server Name Indication) to specify the hostname
Can specify Security Policy to support older SSL/TLS
SNI - Server Name Indication
Multiple SSl certificates onto one web server
Connection Draining
CLB - Connection Draining
ALB & NLB - Deregistration Delay
Set to 1 - 3600 seconds (default: 300)
Set to 0 to disable
Auto Scaling Group (ASG)
free
scale-out => add instances
scale-in => remove instances
must create a Launch Template
ASG can be triggered by CloudWatch Alarms (auto-scaling)
Auto Scaling Group Scaling Policies
Dynamic Scaling Policies (3 Types)
Target Tracking Scaling
Simple / Step Scaling
Scheduled Actions
Predictive Scaling
continuously forecast load and schedule scaling
Good metrics to scale on:
CPUUtilization
RequestCountPerTarget
Average Network In/Out (network bound)
Custom Metric (pushed to CloudWatch)
Cooldown period (default 300 seconds)
Section 08: AWS Fundamentals: RDS + Aurora + ElastiCache
RDS - Relational Database Service
Managed DB service that use SQL a query language
Postgres
MySQL
MariaDB
Oracle
Microsoft SQL Server
Aurora (AWS Proprietary DB)
What you get:
automatic provisioning, os patching
continuous backups (Point in Time Restore)
Monitoring, Read replicas, DR with MultiAZ, Maintenance windows, Scalability, storage on gp2 or io1
CANNOT SSH into instance
Storage Auto Scales
Set a Maximum Storage Threshold
Triggers on: 10% space remaining for 5 minutes, and 6 hour cooldown
RDS Read Replicas
Up to 5
Within AZ, Cross AZ, Cross Region
ASYNC Replication
Can be promoted to independent DB
Application must update connection string to make use of read replicas
No cross AZ cost, but yes cost if cross Region
RDS Multi AZ <> Disaster Recovery (SYNC Replication)
One DNS name - automatic failover to standby
Increase availability
Read replicas can also be setup as MultiAZ for DR
SingleAZ to MultiAZ
zero downtime
'modify' SYNC Replication to Standby DB
RDS Custom
Oracle and Microsoft SQL Server
Access to underlying instances
config settings, patches, enable native features, SSH or SSM Session Manager into EC2
DEACTIVATE Automation Mode while tweaking
Amazon Aurora
Compatible with Postgres or MySQL
5x performance MySQL and 3x over Postgres
10GB
to 128GB
(grows in increments)
Up to 15 read replicas (sub 10ms
replica lag)
Failover is instantaneous, HA!!!
6 copies across 3 AZ
4 out 6 for writes
3 out 6 for reads
self-healing with peer-to-peer replication
storage stored across 100s of volumes
1 Master that handles WRITES (failure in under 30secs)
Up to 15 RR (any can be upgraded to master) with AUTO-Scaling!!!
Support cross region replication
Writer Endpoint
and Reader Endpoint
Features:
Automatic fail-over
Backup and Recovery
Isolation and Security
Industry compliance
Push-button scaling
Automatic Patching with Zero-Downtime
Advanced Monitoring
Routine maintenance
Backtrack (without backup)
Custom Endpoints
Bigger instances for analytics - no longer linked to reader endpoint
Aurora Serverless
Automatic DB instantiation and auto-scaling based on usage. No capacity planning. Pay/sec
client talks to Proxy Fleet
managed by Aurora
Aurora MultiMaster
<> IMMEDIATE failover
Global Aurora
Two Flavors
Aurora Cross Region Read Replicas
Aurora Global Database (recommended)
1 Primary Region
5 Secondary regions (read-only), replication lag less than 1 second
Up to 16 RR per secondary - decrease lag, HA
DR less than 1 minute
Typical cross-region replication takes LESS THAN 1 SECOND
Aurora Machine Learning
Integrates with AWS ML services
Amazon SageMaker
Amazon Comprehend
(sentiment analysis)
Use CASE: fraud detection, ads targeting, product recommendations, sentiment analysis
RDS Backups
Automated Backups
- can be disabled
daily backups
Transaction logs are backed-up by RDS every5 minutes
(5 minutes ago to oldest )
1 to 35 days, 0 to disable
Manual DB Snapshots
triggered by user
CAN BE STORED FOREVER
Aurora Backups
Automated backups
- CANNOT BE DISABLED - Point-In-Time Recovery
Restoring a RDS backup/snapshot CREATES A NEW DB
Can restore a MySQL RDS database from S3
backup on premise DB -> store in S3, -> restore to MySQL RDS
Can restore to MySQL Aurora Cluster for S3
backup on premise DB using Percona XtraBackup -> store in S3 -> restore to MySQL Aurora cluster
Aurora Database Cloning
Faster than snapshot and restore - great for staging and testing, fast on cost-effective
DOES NOT impact production database
RDS & Aurora Security
At-Rest: AWS KMS encryption (configured on creation)
Must encrypt master for RR encryption
In-Flight: TLS-ready by default, use AWS-TLS root certificates
Supports IAM Authentication (IAM Roles)
Control network access via Security Groups
NO SSH except for Custom RDS
Audit Logs can be enabled (limited retention time) - send to CloudWatch for long-term storage
Amazon RDS Proxy
- also works with Aurora
Allow apps to pool and shared DB connections
Improve efficiency by reducing stress on DB and minimize open connections
Serverless, auto-scaling, HA (multi-az)
Reduce Failover time by 66%
Supports RDS(MySQL, Postgres, MariaDB) and Aurora
No code changes just update endpoints
Enforce IAM Authentication for DB, securely store credentials in AWS Secrets Manager
NEVER Publicly accessible; must connect within VPC
Hella useful for Lambda function access to RDS/Aurora
Amazon ElastiCache
managed Redis or Memcached service
in-memory databases HP and Low Latency
Help reduce load on DB for common read queries
helps make app stateless
AWS manages OS maintenance, optimization, setup, config, monitoring, DR, backups
REQUIRES heavy APP changes
Must have cache invalidation strategy to ensure cache is fresh
USE CASE:
session store
, login, write session data to cache, new app looks up session data in cache to keep user logged in (achieve stateless app)
gaming leaderboard
Redis sorted sets
- guarantees uniqueness and element ordering
REDIS vs MEMCACHED
REDIS
MZ with AutoFailure
Read Replicas scale reads and HighAvailability
Data durability with AOF persistence
Backup and restore
MEMCACHED
multi-node partitioning of data (sharding)
No HA (no replication)
No persistence
No backup, no restore
Multi-threaded (via sharding)
DO NOT SUPPORT AIM authentication
USE:
Redis AUTH
(password/token)
Extra level of security on top of Security Groups
Supports SSL encryption
Memcached
Supports SASL-based authentication
PATTERNS:
Lazy Loading
all read data is written to cache; BEWARE of stale data
Write Through
Add/Update cache on write to DB (no stale data)
Session Store
DNS
= Domain Name System
translates human friendly hostnames into IP addresses
Domain Registrar => Where you buy a domain name
DNS Records
A => IPv4
AAAA => IPv6
CNAME => hostname to hostname
NS => Name Server => Resolves DNS queries (Top-Level Domain TLD .com
, Second-Level Domain SLD - amazon.com
)
Root DNS Server (ICANN), TLD DNS Server (IANA), SLD DNS Server (managed by Domain Registrar)
Authoritative vs Non-Authoritative
FQDN
= Fully Qualified Domain Name
Amazon Route 53
HA, Scalable, fully managed, authoritative DNS
Also a Domain Registrar
Ability to Health Check routes
100% SLA
Domain Name Record
'
Domain Name, Record Type, Value, Routing Policy, TTL (default 300 seconds)
Record Types:
A
AAAA
CNAME (NOT able to create for SLD - Zone Apex)
NS
Hosted Zone
container for records (Public and Private)
50 cents/month per Hosted Zone
TTL
= Time To Live (60 sec to 24 hours) (Mandatory except for Alias Records)
Alias
record can point to SLD and comes with built-in health check
Maps a hostname to an AWS Resource
Automatically recognizes changes to resource's IP address
Can point to Zone Apex
Always of type A/AAAA
Can't set TTL
Targets:
ELB, CloudFront Distributions, API Gateway, Elastic Beanstalk, S3 Websites, VPC InE, Global Accelerator, Route 53 records in same HZ
CAN NOT set an ALIAS for an EC2 DNS name
Routing Policies
Simple (no health checks)
Weighted
DNS records must have same name and type
Failover
Latency based
Geolocation
Geoproximity (Route 53 Traffic Flow feature)
Multi-Value Answer
Health Checks
only for Public Resources
Three Types:
Health Checks that monitor an endpoint
Calculated Health Checks
Health Check that monitor a CloudWatch Alarm (can be used to monitor private resources)
Integrated with CW metrics
Section 10: Classic Solutions Architecture Discussions
whatisthetime.com
Route 53 + Elastic IP + EC2 Instance (stateless)
to
Route 53 (Alias) + MultiAZ ELB + MultiAZ ASG + EC2 instances + Reserve Instances
Well Architected Framework (COST, PERFORMANCE, RELIABILITY, SECURITY, OPERATIONAL EXCELLENCE)
myclothes.com
stateful web app
shopping cart
Session Affinity (ELB Setting)
Browser Cookies (User)
stateless
Heaving HTTP requests
Security Risk (cookies are mutable)
Must validate cookies, 4KB limit
ElastiCache - sub-millisecond (via sessionId) or DynamoDB
Scale reads with RDS RR (up to 5) or implement write-through
via ElastiCache (cache validation)
Multi-AZ for Disaster Recovery
Example of 3-Tier Architecture
mywordpress.com
display/upload images
Route 53 - Multi AZ ELB - MultiAZ EC2 within ASG - ENI (Elastic Network Interface) <=> EFS (Elastic File System)
Instantiating Application Quickly
EC2 Instances => use Golden AMI
- very common pattern
Dynamic configuration => Bootstrapping with User Data
Hybrid: Golden AMI + User Data (Elastic Beanstalk)
RDS => restore from snapshot
EBS/EFS => restore from snapshot
Elastic Beanstalk
(like Netlify for AWS - infrastructure as a platform service)
Web App 3-Tier
PUBLIC SUBNET (CLIENT facing)
PRIVATE SUBNET (APPLICATION layer)
DATA SUBNET (database/cache layer)
Components:
Application
Application Version
Environment
Tiers: (Web Server Environment Tier AND Worker Environment Tier)
Section 11: Amazon S3 Introduction
S3 = Simple Storage Service
use cases:
Backup and storage
Disaster Recovery
Archive
Hybrid Cloud storage
Application hosting
Media hosting
Data Lakes & Big Data Analytics
Software Delivery
Static Website
Stores objects (files) in buckets (directories)
Bucket names must Globally Unique Name
Buckets are defined at the REGION level
no uppercase, no underscore, 3-63 characters, start with number or lowercase letter, no start with xn--
, no end with -s3alias
object files have a key (prefix + object name)
Max object size: 5TB
If greater than 5GB must use "multi-part upload"
Can have Metadata, Tags, VersionID
Security
User-Based
- IAM Policies
Resource-Based
Bucket Policies
- bucket wide rules - Allows Cross Account access
Object Access Control List
- finer grained (can be disabled)
Bucket Access control List
- less common (can be disabled)
Can access if IAM permission allows it OR resource allows it AND not explicitly denied
static site:
http://BUCKET-NAME.s3-website-AWS-REGION.amazonaws.com
Amazon S3 - Versioning
Enabled at bucket level
version "null" for objects that existed prior to enabling versioning
S3 Replication
CRR - Cross Region Replication
SRR - Same Region Replication
Must enable versioning in SOURCE and DESTINATION
Async operation
Only new objects are replicated. Use S3 BATCH REPLICATION to replicate existing objects
No transitive "chain" replication across buckets
Can replicate delete markers, but deletions with version ID are not replicated (NO MALICIOUS DELETES)
S3 Storage Classes
Amazon S3 Standard
Durability 9 9s. - same across all storage class
Availability - varies based on storage class, 99.99%
Amazon S3 Standard-Infrequent Access
Amazon S3 One Zone-Infrequent Access
Amazon S3 Glacier Instant Retrieval (storage and retrieval cost)
millisecond retrieval, 90 day minimum
Amazon S3 Glacier Flexible Retrieval
Expedited (1-5min), Standard (3-5 hours), Bulk (5-12 hours) free; 90 day minimum storage
Amazon S3 Glacier Deep Archive
Standard (12 hours), Bulk (48 hours)
180 days
Amazon S3 Intelligent Tiering
small monthly monitoring and auto-tiering fee, no retrieval charges
Section 12: Advanced Amazon S3
Moving between Storage Classes
automated using Lifecycle Rules
Transition Actions
Expiration Actions
S3 Analytics
gives recommendations for Standard
and Standard-IA
optimum config
Report is updated daily; processing may take 24-48 hours
S3 Requester Pays
Bucket owners pay for storage and data transfer costs
The requester must be authenticated with AWS and they pay transaction cost
S3 Event Notifications
use case: generate thumbnails of images uploaded to S3
Can be processed by:
SNS
SQS
Lambda Function
Amazon EventBridge
Advanced Filtering options, Multiple Destinations, EventBridge Capabilities
S3 Performance
100-200ms first byte
3,500 PUT/COPY/POST/DELETE and 5,500 GET/HEAD request/second/per-prefix
Multi-Part
uploads, recommended >100MB, required for >5GB
S3-Transfer Acceleration
-> transfer to edge for speed
S3 Byte-Range Fetches
Parallelize GETs by requesting specific byte ranges
use case:
speed up downloads, better resilience in case of failures
retrieval only partial data (like the header of a file)
S3 Select and S3 Glacier Select
retrieve less data using SQL to perform server-side filtering
(on CSV files)
S3 Batch Operations
Perform bulk operations on existing S3 objects
modify object metadata
copy objects between S3 buckets
Encrypt un-encrypted objects
Job:
List of objects
Action to perform
Optional parameters
Manages retries, tracks progress, sends completion notifications, generate reports
use S3 Inventory
+ S3 Select
+ S3 Batch Operations
Section 13: Amazon S3 Security
Object Encryption
4 Methods
Server-Side Encryption (SSE)
Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3)
keys handled, managed, and owned by AWS
Encrypted with AES-256
Must set Header: x-amz-server-side-encryption: AES256
Enabled by default for new buckets and new objects
Server-Side Encryption with KMS Keys stored in AWS KMS (SSE-KMS)
Leverage AWS Key Management Service (AWS KMS) to manage encryption keys
Advantages: user control + audit key usage using CloudTrail
Header: x-amz-server-side-encryption: aws:kms
Upload => GenerateDataKey
KMS API
Download => Decrypt
KMS API
There is a quota on requests. Service Quotas Console to request increase
Server-Side Encryption with Customer-Provided Keys (SSE-C)
When you want to manage your own encryption keys
S3 does NOT store key; key must be uploaded with HTTP headers using HTTPS
Client-Side Encryption
User responsible for encrypting data before sending
Can use: Amazon S3 Client-Side Encryption Library
Bucket Policies are handled before DEFAULT ENCRYPTION!!!
Cross-Origin Resource Sharing (CORS)
Origin = scheme + host + port
By default, Web Browsers deny cross-origin requests
Destination but allow requests, CORS HEADERS Access-Control-Allow-Origin
, Access-Control-Allow-Methods
Amazon S3 - MFA Delete
Required: Permanently delete an object, or disable Versioning. Only bucket owner and root account can disable MFA delete
aws configure --profile NAME-OF-PROFILE
aws s3api put-bucket-versioning --bucket NAME-OF-BUCKET --versioning-configuration Status=Enabled,MFADelete=Enabled --mfa "ARN-OF-MFA-DEVICE MFA-CODE" --profile NAME-OF-PROFILE
S3 Access Logs
Can be analyzed using Amazon Athena
DO NOT set logging bucket to same as monitoring bucket!!!
Leads to a logging loop!!!
S3 Pre-Signed URLs
URL Expiration (1min to 720 min in Console; 7 days max in AWS CLI/SDK)
S3 Glacier Vault Lock
it's like a bucket where you can delete shit (WORM - Write Once Read Many)
Vault Lock Policy
Helpful for compliance
S3 Object Lock
- must enable versioning; blocks version deletion
Retention Modes
Compliance
=> no one can delete shit
Governance
=> admins/root can delete shit
Retention Period - Protect object for a fixed amount of time; can be extended
Legal Hold
s3:PutObjectLegalHold IAM Permission
protect object indefinitely; independent from retention period
S3 Access Points
each AP has its own DNS name (Internet Origin or VPC Origin)
AP Policy - 1:1 Manage Security at Scale
S3 Object Lambda
allows AWS Lambda Functions to change object before retrieved by caller
Section 14: Cloudfront & AWS Global Accelerator
AWS CloudFront
Content Delivery Network - CDN
Improves read performance, content is cached at the edge
216 Points of Presence
DDoS protection, Integration with Shield, AWS Web Application Firewall
Secured with Origin Access Control
- OAC versus Origin Access Identity OAI
CloudFront with ALB or EC2 as Origin
ALB/EC2 Instances must be Public
Allow Public IP of Edge Locations
CloudFront Geo Restriction
Can restrict access to distribution
Allowlist
or Blocklist
- approved/banned countries
Pricing
Price Classes
Price Class All: all regions - best performance
Price Class 200: most regions, except pricey
Price Class 100: NA/Europe/Israel
Cache Invalidation
AWS Global Accelerator
uses Anycast IP
instead of Unicast IP
- client routed to nearest server
route clients to closest Edge location via the internal AWS private network
Uses 2 Anycast IP
Works with:
Elastic IP
EC2 Instances
ALB, NLB (public or private)
Health Checks built in (less than 1 min failover)
CloudFront vs Global Accelerator
Improves TCP/UDP app performance via proxying packets
Good for HTTP use cases that require static IP or fast failover
Section 15: AWS Storage Extras
AWS Snow Family
Data Migration
Snowcone
8TB Storage, up to 24TB
Can use AWS DataSync
to send data via www
Snowball Edge
Storage Optimized
80TB of HDD
Can cluster ... up to 15 snowballs
Compute Optimized
Snowmobile
1 Exabyte = 1000 PB
transfers
Each mobile has 100PB
Better than Snowball if >10PB
Edge Computing
All can run EC2 Instances, AWS Lambda via AWS IoT Greengrass
Rule of thumb: use snowball devices if >1 week to transfer
AWS OpsHub
GUI for controlling snow family devices
-Snowball -> S3 -> Lifecycle -> S3 Glacier
Amazon FSx
Launch 3rd party HP FS on AWS
Fully managed
FSx for Windows File Server
fully managed Windows FS shared drive
Supports SMB protocol and Windows NTFS
Integration with Microsoft AD, ACLs, and user quotas
Can be mounted on Linux EC2 instances
Supports Microsoft's Distributed File System (DFS) Namespaces
- link on-premise Windows FS to Cloud
Storage Options:
Can access Windows FSx from on-premise with VPN or Direct Connect
Can be MultiAZ; Data backed up daily to S3 for DR
FSx for Lustre
= Linux + cluster
Parallel distributed FS for large-scale computing
Machine Learning + HPC (High Performance Computing)
Seamless integration with S3. Can "read" as FS through FSx and write
Can be used from on-premise via (VPN or Direct Connect)
Scratch File System
Persistent File System (replication within single AZ)
FSx for NetApp ONTAP
NFS, SMB, iSCSI
Broad compatibility (Workspaces, VWWare Cloud on AWS, AppStream 2.0)
Storage auto-shrinks or grows
snapshots
replication
data compression and de-duplication
Point-in-time instantaneous cloning (helpful for testing new workloads)
FSx for OpenZFS
compatible with NFS
Broad compatibility
point-in-time instantaneous cloning
Up to 1 million iops, sub ms latency
Snapshots and compression, low cost
AWS Storage Gateway
- bridge between on-premise and cloud data
Block Storage
File Storage
Object Storage
Use Cases:
Types:
S3 File Gateway
NFS or SMB ... behind the scenes uses HTTPS
Most recent used files cached in file gateway
SMB allows for AD for user auth
FSx File Gateway
Native access to Amazon FSx for Windows File Server
Advantage is the local cache
Good for group file shares and home dirs
Volume Gateway
block storage uses iSCSI backed by S3
Backed by EBS to restore on-premises volumes
Cache volumes
- low lat
Stored volumes
- all dataset is on premise, for backup
mainly for backup and restore
Tape Gateway
for physical tape backups in the cloud
S3 or Glacier
also can use iSCSI
Storage Gateway - Hardware Appliance
If you can virtual the gateway
AWS Transfer Family
uses FTP, FTPS, SFTP
can transfer to S3 or EFS
Can used Microsoft AD, LDAP, Okta, Amazon Cognito for authentication
AWS DataSync
Move large amount of data to/from
On-premise to cloud and vice versa ... needs an agent
AWS to AWS ( o agent)
Replication task is not sync; it is scheduled: hourly, daily, weekly...
File permissions and metadata are preserved (NFS POSIX, SMB)
Can sync with ALL S3 (including glacier), EFX or FSx
console.log('including:', including)
Summary:
EC2 Instance storage: physical storage with high IOPS!!!
Section 16: Decoupling Applications: SQS, SNS, Kinesis, Active MQ
Sync communications vs Async/Event-Based communication between services
SQS - Simple Queueing Service
- Queue Model
queue model: queue/messages/poll/long polling/Producers/Consumers
used to decouple applications
retention time: 4 days, max 14 days
low latency (<10ms)
256KB message limit
At least once delivery , "best effort ordering" by default
SendMessage
API, message persisted until Consumer deletes message
unlimited throughput (in standard configuration)
Polling up to 10 messages at a time
. DeleteMessage
API, ReceiveMessages
Scale Consumers using ASG and CloudWatch Metric
(ApproximateNumberOfMessages
) -> setup a CloudWatch Alarm
Security:
in-flight with HTTPS API, at rest with KMS keys, or client-side encryption
Access Controls via IAM policies or SQS Access Policies (cross-account)
Messages become invisible
to other consumers once polled - set message visibility timeout - default 30 seconds --- must be processed and deleted by Consumer or message will return to the queue
ChangeMessageVisibility
API can give a Consumer more time to process
SNS - Simple Notification Service
- Pub/Sub model
Kinesis
- Real-Time Streaming mode