AWS Solutions Certified Architect Associate Course by Stephane Maarek

AWS Solutions Certified Architect Associate Course by Stephane Maarek
- Lessons
- Notes

Lessons

Notes

Section 01: Introduction

No notes

Section 02: Getting Started with AWS

AWS is the biggest Cloud Provider
AWS Regions
- cluster of data centers
- Most AWS Services are region-scoped
How to select an AWS Region
- Compliance
- Proximity
- Available services
- Pricing
AWS Availability Zones
- Minimum of 2 AZ within each AWS Region (usually 3)
- Discrete data centers, redundant power, networking, and connectivity
- Ultra-low latency networking
AWS Points of Presence (Edge Locations)
- 216+ PoP

Section 03: IAM

IAM Groups only contain IAM Users
IAM Policy ... JSON document
- Define permission for users
- Least Privilege
Best Practices -- Create an Admin IAM Group (AdministratorAccess) to replace root user for day-to-day activities
IAM Password Policy - behave bitch (cycle passwords)
MFA
- Virtual MFA device
- U2F Security Key
- Hardware Key Fob MFA Device
- Hardware Key Fob MFA Device for AWS GovCloud
3 Ways to Access AWS:
- AWS Management Console
- AWS CLI
- AWS SDK
IAM Roles used to assign permission to AWS Services
IAM Credentials Report
- account-level, list all account users and credential status
IAM Access Advisor
- user-level, shows service permissions on user and when last used (least-privilege)

Section 04: EC2 Fundamentals

AWS Budget - create alarms triggered when budget threshold metrics are exceeded
EC2 - Elastic Compute Cloud - Infrastructure as a Service
- Storing data on virtual drives (EBS)
- Distribute load using (ELB)
- Scaling the services using auto-scaling group (ASG)
EC2 User Data
- Script used to bootstrap EC2 instance, only run once, first start
- Used to automate boot tasks:
  - Installing updates
  - Installing software
  - Downloading common files from internet
  - Anything really
- Runs as root user
Security Group
- set of firewall rules that control traffic for your instance
- work on instance-level
- contain only allow rules
- Can reference by IP or by Security Group
- SG regulate:
  - Access to Ports
  - Authorized IP ranges (IPv4 & IPv6)
  - Control inbound traffic to instance
  - Control outbound traffic from instance to "other"s
    - By default
      - all outbound traffic is allowed
      - all inbound traffic is blocked
- Can be attached to multiple instances
- Locked to Region/VPC combination
- Request is intercepted before EC2 instance receives it
- "time out" -> request not allowed by SG
- "connection refused" -> application issue
Classic PORTS to know
- 22 -> SSH
- 21 -> FTP
- 22 -> SFTP
- 80 -> HTTP
- 442 -> HTTPS
- 3389 -> RDP (Remote Desktop Protocol)
Instance Types (7)
- General Purpose
- Compute Optimized
- Memory Optimized
- Accelerated Computing
- Storage Optimized
- Instance Features
- Measuring Instance Performance
SSH
- ssh -i KEY.pem ec2-user@PUBLIC_IP
- chmod 0400 KEY.pem
EC2 Instance Connect
- Browser-based terminal to connect to EC2 Instance via AWS Management Console
- Attach IAM Roles to EC2 Instances
  - DON'T RUN "aws configure" within an instance terminal!!!
EC2 instances Purchasing Options
- On-Demand
  - short-term, un-interrupted workloads
- Reserved (1 & 3 years)
  - Reserved Instances (long workloads)
    - Instance Type, Region, Tenancy, OS
    - Reserved Instance Scope - Regional or Zonal
  - Convertible Reserved Instances - (long workloads with flexible instances)
    - Can change specs
- Savings Plans (1 & 3 years) - commitments to an amount of usage, long workload
  - Commit to dollar/hour for 1 & 3 years
  - Locked to instance family and region
- Spot Instances - short workload, cheap, can be interrupted
  - Jobs that are resilient to failure:
    - Batch jobs
    - Data analysis
    - Image processing
    - Distributed workloads
  - max spot price, 2 minutes
- Dedicated Host - book physical host
  - compliance requirements, server-bound software licenses
- Dedicated Instances - No other customer will share your hardware
  - No control over placement of hardware
- Capacity Reservations - Reserve capacity in specific AZ for any duration
  - Reserve On-Demand in a specific AZ
  - No time commitment, no billing discount
  - Charge whether you use it or not
  - short-term, uninterrupted workloads in a specific AZ
Spot Fleet
- lowest price - cost optimization, short workloads
- diversified - great for availability, long workloads
- capacityOptimized - optimal capacity

Section 05: EC2 Solutions Architect Level

Elastic IP
- a Public IP (IPv4) that can be attached to an instance to retain a fixed IP address
- limited to 5 Elastic IP
- not a good architecture pattern
- in theory, it allows for a failed instance to be remapped as a disaster recovery strategy
- ELB is a better approach
EC2 Placement Groups
- Cluster - low-latency group in single AZ
  - 10 Gbps network, low-latency, same rack, same AZ, high risk
- Spread - max 7 instances/group/AZ - critical applications
  - minimize failure risk (all instances on different hardware)
  - span multiple AZ
  - maximize high availability
- Partition - 100s EC2 instances/group (allows Hadoop, Kafka, Cassandra)
  - Up to 7 partitions per AZ
  - Multiple AZs in same Region
  - 100s of EC2 instances
  - Partitions on separate racks
  - Big Data Applications (HDFS, HBase, Cassandra, Kafka)
Elastic Network Interfaces (ENI)
- Virtual Network Card
- Can have:
  - 1 Primary (eth0) private IPv4, one or more secondary (eth1) IPv4
  - 1 Public IPv4
  - 1 Elastic IP/private-IPv4
  - 1 or more SG
  - MAC address
- Bound to a specific AZ
- Create ENI independently of EC2 instance and attachable on the fly (use case failover)
EC2 Hibernate
- Stop, Terminate, Hibernate
- EBS root volume must be incremented and EBS volume > RAM size; 60 day limit

Section 06: EC2 Instance Storage

EBS (Elastic Block Store) Volume
- network drive - can persist data after termination
- multi-attached to mount EBS onto multiple EC2 instances
- bound to specific AZ
EBS snapshots
- backup of EBS volume
- can copy snapshots onto other AZ/Regions
EBS Snapshot Archive
- 24 - 72 hours to restore, 75% cheaper
Recycle Bin for EBS Snapshots
- To recover EBS Snapshots after accidental deletion
- 1 day to 1 year
Fast Snapshot Restore (FSR)
- expensive but quick; useful for big volumes
AMI (Amazon Machine Image)
- customization of an EC2 instance
- allows for faster boot (pre-packaged software packages/setup)
- Can come from three sources:
  - Public AMI
  - Custom AMI (you maintain it)
  - AWS Marketplace AMI
EC2 Instance Store
- High-performance hardware disk
- Storage is ephemeral
- buffer/cache/scratch data/temporary content
- backups and replication are your responsibility
- i3
EBS Volume Types (6 Types)
- gp2/gp3 (SSD) - General purpose, balances price/performance - BOOT
- io1/io2 (SSD) - Provisioned IOPS (PIOPS SSD) Highest performance, mission-critical low-lat, high-thru - BOOT - More than 16,000 IOPS, Great for databases - 4GB - 16 TB
  - MAX 64,000 IOPS for Nitro EC2, otherwise 32,000
  - io2 more durability and more IOPS per GB wrt io1
  - io2 Block Express - sub-millisecond latency; MAX IOPS 256,000 IOPS:GB 1,000,1
  - supports EBS multi-attach!!!
- st1 (HDD) - Low cost volume, frequent access and high throughput
  - Max throughput 500 MiB/s to 500
- sc1 (HDD) - Low cost, less frequent access
- For gp3, IOPS and volume are independent; gp2 3 IOPS per GB - linked
EBS Multi-Attach (io1/io2 family)
- Attach multiple EC2 instances in the same AZ
- Up to 16 EC2 instances at a time
- Must use a File System that is cluster-aware
- higher application availability
EBS Encryption
- Data at rest is encrypted
- In-flight data between instance and volume is encrypted
- Snapshots are encrypted
- Leverages keys from KMS (AES-256)
Amazon EFS - Elastic File System
- Manged NFS can be mounted on many EC2
- Works across multi-AZ
- Highly available, scalable, expensive (3x gp2), pay per use
- uses NFSv4.1 protocol
- Compatible with Linux based AMI (not Windows)
- POSIX file system
- SCALE
  - 1000x concurrent attachments, 10GB+/s throughput
  - Petabyte scale, automatically
- PERFORMANCE
  - General Purpose (low-latency) or Max I/O (higher latency)
- THROUGHPUT
  - Bursting (1 TB = 50 MB/s + 100 MB/s)
  - Provisioned - throughput independent of size
- Storage Tiers - Standard and Infrequent access (EFS Standard - EFS IA)
- Availability - Standard: Multi-AZ or One Zone (EFS One Zone-IA)
- EC2 Instance Metadata
  - http://169.254.169.254/latest/meta-data
    - can retrieve meta-data and user-data

Section 07: High Availability and Scalability

Scalability app/system can adapt to increases/decreases in load
Vertical => more power
Horizontal (elasticity) => more servers
Elastic Load Balancer (ELB) - managed load balancer
- AWS guarantees it will work, upgrades, maintenance, high availability
- Integrated with EC2, EC2 Auto Scaling Groups, Amazon ECS, AWS Certificate Manager, CloudWatch, Route53, AWS WAF, AWS Global Accelerator
- Health Checks
  - port: 4567 and /health endpoint
- 4 Types of Load Balancers
  - Classic Load Balancer (CLB) (DEPRECATED)
    - HTTP, HTTPS, TCP, SSL (secure TCP)
  - Application Load Balancer (ALB)
    - Works on request level
    - Layer 7 (HTTP/HTTPS)
    - HTTP, HTTPS, WebSocket
    - Routing Tables
      - Can route based on path, hostname, or query strings/headers
      - Great for micro-services and container-based applications
    - Port Mapping feature to redirect to dynamic port in ECS
    - Fixed hostname (xxx.region.elb.amazonaws.com)
    - IP of client found in X-Forwarded-For, X-Forwarded-Port, X-Forwarded-Proto
  - Target Groups
    - EC2 Instances, ECS tasks, Lambda functions (HTTP request to JSON event), IP addresses (private IP), can route to multiple TG, Health Checks at TG level
  - Network Load Balancer (NLB)
    - TCP, TLC (secure TCP), UDP
    - Works on connection level
    - Layer 4
    - NLB has one static IP per AZ, supports Elastic IP
    - millions of request per second, less latency 100ms (vs 400ms)
    - Can Redirect to
      - EC2 instances, private IP addresses (on-premise machines), other ALB
    - Health Checks on:
      - TCP, HTTP, HTTPS Protocols
  - Gateway Load Balancer (GWLB)
    - Layer 3, IP Protocol
    - Uses GENEVE protocol on port 6081
    - Deploy, scale, and manage a fleet of 3rd party network virtual apps
    - Firewalls, Intrusion Detection, Deep Packet Inspection, payload manipulation
    - Works at network level - IP Packets
    - Combines Transparent Network Gateway = Load Balancer
    - Targets: EC2 Instances and private IPs
- Sticky Sessions
  - Serial request from a user are routed to same instances by ELB
  - Can be enabled for ALB and CLB
  - Use case: retain session data (login for example)
  - Two Types of Cookies
    - Application-based Cookies
      - Custom cookie
        
        Generated by target (application)
        
        Custom attributes, cookie name per TG
        
        Forbidden: AWSALB, AWSALBAPP , AWSALBTG
      - Application Cookie
        
        Generated by load balancer
        
        AWSALBAPP
    - Duration-based Cookies
      - Generated by load balancer
      - AWSALB or AWSELB
- Cross-Zone Load Balancing
  - Enabled: All instances share the burden equally regardless of AZ
  - Enabled by default for ALB (no extra charge for cross AZ data)
  - Can be disabled at TG level
  - Disabled by default in NLB and GWLB (charge if you enable)
- SSL Certificate
  - Secure Socket Layer
  - TLS = Transport Layer Security
  - in-flight encryption
  - Certificate Authorities
    - Comodo, Symantec, GoDaddy, GlobalSign, Digicert, Letsencrypt
    - Have an expiration date (you set) and must be renewed
  - LB uses X.509 certificate, managed by ACM (AWS Certificate Manager)
    - can also create, upload your own certs
  - HTTPS Listener:
    - Must specify default cert
    - optional list of certs to support multiple domains
    - Client can use SNI (Server Name Indication) to specify the hostname
    - Can specify Security Policy to support older SSL/TLS
  - SNI - Server Name Indication
    - Multiple SSl certificates onto one web server
- Connection Draining
  - CLB - Connection Draining
  - ALB & NLB - Deregistration Delay
  - Set to 1 - 3600 seconds (default: 300)
  - Set to 0 to disable
Auto Scaling Group (ASG)
- free
- scale-out => add instances
- scale-in => remove instances
- must create a Launch Template
  - ASG can be triggered by CloudWatch Alarms (auto-scaling)
Auto Scaling Group Scaling Policies
- Dynamic Scaling Policies (3 Types)
  - Target Tracking Scaling
    - Average CPU
  - Simple / Step Scaling
  - Scheduled Actions
- Predictive Scaling
  - continuously forecast load and schedule scaling
- Good metrics to scale on:
  - CPUUtilization
  - RequestCountPerTarget
  - Average Network In/Out (network bound)
  - Custom Metric (pushed to CloudWatch)
- Cooldown period (default 300 seconds)

Section 08: AWS Fundamentals: RDS + Aurora + ElastiCache

RDS - Relational Database Service
- Managed DB service that use SQL a query language
  - Postgres
  - MySQL
  - MariaDB
  - Oracle
  - Microsoft SQL Server
  - Aurora (AWS Proprietary DB)
- What you get:
  - automatic provisioning, os patching
  - continuous backups (Point in Time Restore)
  - Monitoring, Read replicas, DR with MultiAZ, Maintenance windows, Scalability, storage on gp2 or io1
- CANNOT SSH into instance
- Storage Auto Scales
- Set a Maximum Storage Threshold
  - Triggers on: 10% space remaining for 5 minutes, and 6 hour cooldown
- RDS Read Replicas
  - Up to 5
  - Within AZ, Cross AZ, Cross Region
  - ASYNC Replication
  - Can be promoted to independent DB
  - Application must update connection string to make use of read replicas
  - No cross AZ cost, but yes cost if cross Region
- RDS Multi AZ <> Disaster Recovery (SYNC Replication)
  - One DNS name - automatic failover to standby
  - Increase availability
  - Read replicas can also be setup as MultiAZ for DR
- SingleAZ to MultiAZ
  - zero downtime
  - 'modify' SYNC Replication to Standby DB
- RDS Custom
  - Oracle and Microsoft SQL Server
  - Access to underlying instances
    - config settings, patches, enable native features, SSH or SSM Session Manager into EC2
    - DEACTIVATE Automation Mode while tweaking
Amazon Aurora
- Compatible with Postgres or MySQL
- 5x performance MySQL and 3x over Postgres
- 10GB to 128GB (grows in increments)
- Up to 15 read replicas (sub 10ms replica lag)
- Failover is instantaneous, HA!!!
- 6 copies across 3 AZ
  - 4 out 6 for writes
  - 3 out 6 for reads
  - self-healing with peer-to-peer replication
  - storage stored across 100s of volumes
- 1 Master that handles WRITES (failure in under 30secs)
- Up to 15 RR (any can be upgraded to master) with AUTO-Scaling!!!
- Support cross region replication
- Writer Endpoint and Reader Endpoint
  - Features:
    - Automatic fail-over
    - Backup and Recovery
    - Isolation and Security
    - Industry compliance
    - Push-button scaling
    - Automatic Patching with Zero-Downtime
    - Advanced Monitoring
    - Routine maintenance
    - Backtrack (without backup)
- Custom Endpoints
  - Bigger instances for analytics - no longer linked to reader endpoint
- Aurora Serverless
  - Automatic DB instantiation and auto-scaling based on usage. No capacity planning. Pay/sec
    - client talks to Proxy Fleet managed by Aurora
- Aurora MultiMaster <> IMMEDIATE failover
  - All nodes are RW
- Global Aurora
  - Two Flavors
    - Aurora Cross Region Read Replicas
    - Aurora Global Database (recommended)
      - 1 Primary Region
      - 5 Secondary regions (read-only), replication lag less than 1 second
      - Up to 16 RR per secondary - decrease lag, HA
      - DR less than 1 minute
      - Typical cross-region replication takes LESS THAN 1 SECOND
- Aurora Machine Learning
  - Integrates with AWS ML services
    - Amazon SageMaker
    - Amazon Comprehend (sentiment analysis)
    - Use CASE: fraud detection, ads targeting, product recommendations, sentiment analysis
RDS Backups
- Automated Backups - can be disabled
  - daily backups
  - Transaction logs are backed-up by RDS every5 minutes (5 minutes ago to oldest)
  - 1 to 35 days, 0 to disable
- Manual DB Snapshots
  - triggered by user
  - CAN BE STORED FOREVER
Aurora Backups
- Automated backups - CANNOT BE DISABLED - Point-In-Time Recovery
- Restoring a RDS backup/snapshot CREATES A NEW DB
- Can restore a MySQL RDS database from S3
  - backup on premise DB -> store in S3, -> restore to MySQL RDS
- Can restore to MySQL Aurora Cluster for S3
  - backup on premise DB using Percona XtraBackup -> store in S3 -> restore to MySQL Aurora cluster
Aurora Database Cloning
- Faster than snapshot and restore - great for staging and testing, fast on cost-effective
- DOES NOT impact production database
RDS & Aurora Security
- At-Rest: AWS KMS encryption (configured on creation)
  - Must encrypt master for RR encryption
- In-Flight: TLS-ready by default, use AWS-TLS root certificates
- Supports IAM Authentication (IAM Roles)
- Control network access via Security Groups
- NO SSH except for Custom RDS
- Audit Logs can be enabled (limited retention time) - send to CloudWatch for long-term storage
Amazon RDS Proxy - also works with Aurora
- Allow apps to pool and shared DB connections
- Improve efficiency by reducing stress on DB and minimize open connections
- Serverless, auto-scaling, HA (multi-az)
- Reduce Failover time by 66%
- Supports RDS(MySQL, Postgres, MariaDB) and Aurora
- No code changes just update endpoints
- Enforce IAM Authentication for DB, securely store credentials in AWS Secrets Manager
- NEVER Publicly accessible; must connect within VPC
- Hella useful for Lambda function access to RDS/Aurora
Amazon ElastiCache
- managed Redis or Memcached service
- in-memory databases HP and Low Latency
- Help reduce load on DB for common read queries
- helps make app stateless
- AWS manages OS maintenance, optimization, setup, config, monitoring, DR, backups
- REQUIRES heavy APP changes
- Must have cache invalidation strategy to ensure cache is fresh
- USE CASE:
  - session store, login, write session data to cache, new app looks up session data in cache to keep user logged in (achieve stateless app)
  - gaming leaderboard
    - Redis sorted sets - guarantees uniqueness and element ordering
- REDIS vs MEMCACHED
  - REDIS
    - MZ with AutoFailure
    - Read Replicas scale reads and HighAvailability
    - Data durability with AOF persistence
    - Backup and restore
  - MEMCACHED
    - multi-node partitioning of data (sharding)
    - No HA (no replication)
    - No persistence
    - No backup, no restore
    - Multi-threaded (via sharding)
- DO NOT SUPPORT AIM authentication
  - USE:
    - Redis AUTH (password/token)
    - Extra level of security on top of Security Groups
    - Supports SSL encryption
  - Memcached
    - Supports SASL-based authentication
  - PATTERNS:
    - Lazy Loading
      - all read data is written to cache; BEWARE of stale data
    - Write Through
      - Add/Update cache on write to DB (no stale data)
    - Session Store
      - Expire with TTL

Section 09: Route 53

DNS = Domain Name System
- translates human friendly hostnames into IP addresses
Domain Registrar => Where you buy a domain name
DNS Records
- A => IPv4
- AAAA => IPv6
- CNAME => hostname to hostname
- NS => Name Server => Resolves DNS queries (Top-Level Domain TLD .com, Second-Level Domain SLD - amazon.com)
- Root DNS Server (ICANN), TLD DNS Server (IANA), SLD DNS Server (managed by Domain Registrar)
  - Authoritative vs Non-Authoritative
- FQDN = Fully Qualified Domain Name
Amazon Route 53
- HA, Scalable, fully managed, authoritative DNS
- Also a Domain Registrar
- Ability to Health Check routes
- 100% SLA
Domain Name Record'
- Domain Name, Record Type, Value, Routing Policy, TTL (default 300 seconds)
- Record Types:
  - A
  - AAAA
  - CNAME (NOT able to create for SLD - Zone Apex)
  - NS
- Hosted Zone
  - container for records (Public and Private)
  - 50 cents/month per Hosted Zone
- TTL = Time To Live (60 sec to 24 hours) (Mandatory except for Alias Records)
- Alias record can point to SLD and comes with built-in health check
  - Maps a hostname to an AWS Resource
  - Automatically recognizes changes to resource's IP address
  - Can point to Zone Apex
  - Always of type A/AAAA
  - Can't set TTL
  - Targets:
    - ELB, CloudFront Distributions, API Gateway, Elastic Beanstalk, S3 Websites, VPC InE, Global Accelerator, Route 53 records in same HZ
    - CAN NOT set an ALIAS for an EC2 DNS name
- Routing Policies
  - Simple (no health checks)
  - Weighted
    - DNS records must have same name and type
  - Failover
  - Latency based
  - Geolocation
  - Geoproximity (Route 53 Traffic Flow feature)
  - Multi-Value Answer
- Health Checks
  - only for Public Resources
  - Three Types:
    - Health Checks that monitor an endpoint
    - Calculated Health Checks
    - Health Check that monitor a CloudWatch Alarm (can be used to monitor private resources)
  - Integrated with CW metrics

Section 10: Classic Solutions Architecture Discussions

whatisthetime.com
- Route 53 + Elastic IP + EC2 Instance (stateless)
- to
- Route 53 (Alias) + MultiAZ ELB + MultiAZ ASG + EC2 instances + Reserve Instances
- Well Architected Framework (COST, PERFORMANCE, RELIABILITY, SECURITY, OPERATIONAL EXCELLENCE)
myclothes.com
- stateful web app
- shopping cart
  - Session Affinity (ELB Setting)
  - Browser Cookies (User)
    - stateless
    - Heaving HTTP requests
    - Security Risk (cookies are mutable)
    - Must validate cookies, 4KB limit
  - ElastiCache - sub-millisecond (via sessionId) or DynamoDB
- Scale reads with RDS RR (up to 5) or implement write-through via ElastiCache (cache validation)
- Multi-AZ for Disaster Recovery
- Example of 3-Tier Architecture
mywordpress.com
- display/upload images
- Route 53 - Multi AZ ELB - MultiAZ EC2 within ASG - ENI (Elastic Network Interface) <=> EFS (Elastic File System)
Instantiating Application Quickly
- EC2 Instances => use Golden AMI - very common pattern
- Dynamic configuration => Bootstrapping with User Data
- Hybrid: Golden AMI + User Data (Elastic Beanstalk)
- RDS => restore from snapshot
- EBS/EFS => restore from snapshot
Elastic Beanstalk (like Netlify for AWS - infrastructure as a platform service)
- Web App 3-Tier
  - PUBLIC SUBNET (CLIENT facing)
  - PRIVATE SUBNET (APPLICATION layer)
  - DATA SUBNET (database/cache layer)
- Components:
  - Application
  - Application Version
  - Environment
    - Tiers: (Web Server Environment Tier AND Worker Environment Tier)

Section 11: Amazon S3 Introduction

S3 = Simple Storage Service
- use cases:
  - Backup and storage
  - Disaster Recovery
  - Archive
  - Hybrid Cloud storage
  - Application hosting
  - Media hosting
  - Data Lakes & Big Data Analytics
  - Software Delivery
  - Static Website
- Stores objects (files) in buckets (directories)
- Bucket names must Globally Unique Name
- Buckets are defined at the REGION level
  - no uppercase, no underscore, 3-63 characters, start with number or lowercase letter, no start with xn--, no end with -s3alias
- object files have a key (prefix + object name)
- Max object size: 5TB
- If greater than 5GB must use "multi-part upload"
- Can have Metadata, Tags, VersionID
- Security
  - User-Based - IAM Policies
  - Resource-Based
    - Bucket Policies - bucket wide rules - Allows Cross Account access
    - Object Access Control List - finer grained (can be disabled)
    - Bucket Access control List - less common (can be disabled)
  - Can access if IAM permission allows it OR resource allows it AND not explicitly denied
- static site:
  - http://BUCKET-NAME.s3-website-AWS-REGION.amazonaws.com
Amazon S3 - Versioning
- Enabled at bucket level
- version "null" for objects that existed prior to enabling versioning
S3 Replication
- CRR - Cross Region Replication
- SRR - Same Region Replication
- Must enable versioning in SOURCE and DESTINATION
- Async operation
- Only new objects are replicated. Use S3 BATCH REPLICATION to replicate existing objects
- No transitive "chain" replication across buckets
- Can replicate delete markers, but deletions with version ID are not replicated (NO MALICIOUS DELETES)
S3 Storage Classes
- Amazon S3 Standard
  - Durability 9 9s. - same across all storage class
  - Availability - varies based on storage class, 99.99%
- Amazon S3 Standard-Infrequent Access
- Amazon S3 One Zone-Infrequent Access
- Amazon S3 Glacier Instant Retrieval (storage and retrieval cost)
  - millisecond retrieval, 90 day minimum
- Amazon S3 Glacier Flexible Retrieval
  - Expedited (1-5min), Standard (3-5 hours), Bulk (5-12 hours) free; 90 day minimum storage
- Amazon S3 Glacier Deep Archive
  - Standard (12 hours), Bulk (48 hours)
  - 180 days
- Amazon S3 Intelligent Tiering
  - small monthly monitoring and auto-tiering fee, no retrieval charges

Section 12: Advanced Amazon S3

Moving between Storage Classes
- automated using Lifecycle Rules
  - Transition Actions
  - Expiration Actions
S3 Analytics gives recommendations for Standard and Standard-IA optimum config
- Report is updated daily; processing may take 24-48 hours
S3 Requester Pays
- Bucket owners pay for storage and data transfer costs
- The requester must be authenticated with AWS and they pay transaction cost
S3 Event Notifications
- use case: generate thumbnails of images uploaded to S3
- Can be processed by:
  - SNS
  - SQS
  - Lambda Function
  - Amazon EventBridge
    - Advanced Filtering options, Multiple Destinations, EventBridge Capabilities
S3 Performance
- 100-200ms first byte
- 3,500 PUT/COPY/POST/DELETE and 5,500 GET/HEAD request/second/per-prefix
- Multi-Part uploads, recommended >100MB, required for >5GB
- S3-Transfer Acceleration -> transfer to edge for speed
- S3 Byte-Range Fetches
  - Parallelize GETs by requesting specific byte ranges
  - use case:
    - speed up downloads, better resilience in case of failures
    - retrieval only partial data (like the header of a file)
- S3 Select and S3 Glacier Select
  - retrieve less data using SQL to perform server-side filtering (on CSV files)
- S3 Batch Operations
  - Perform bulk operations on existing S3 objects
    - modify object metadata
    - copy objects between S3 buckets
    - Encrypt un-encrypted objects
  - Job:
    - List of objects
    - Action to perform
    - Optional parameters
  - Manages retries, tracks progress, sends completion notifications, generate reports
    - use S3 Inventory + S3 Select + S3 Batch Operations

Section 13: Amazon S3 Security

Object Encryption
- 4 Methods
  - Server-Side Encryption (SSE)
    - Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3)
      - keys handled, managed, and owned by AWS
      - Encrypted with AES-256
      - Must set Header: x-amz-server-side-encryption: AES256
      - Enabled by default for new buckets and new objects
    - Server-Side Encryption with KMS Keys stored in AWS KMS (SSE-KMS)
      - Leverage AWS Key Management Service (AWS KMS) to manage encryption keys
      - Advantages: user control + audit key usage using CloudTrail
      - Header: x-amz-server-side-encryption: aws:kms
      - Upload => GenerateDataKey KMS API
      - Download => Decrypt KMS API
      - There is a quota on requests. Service Quotas Console to request increase
    - Server-Side Encryption with Customer-Provided Keys (SSE-C)
      - When you want to manage your own encryption keys
      - S3 does NOT store key; key must be uploaded with HTTP headers using HTTPS
  - Client-Side Encryption
    - User responsible for encrypting data before sending
      - Can use: Amazon S3 Client-Side Encryption Library
      - Bucket Policies are handled before DEFAULT ENCRYPTION!!!
Cross-Origin Resource Sharing (CORS)
- Origin = scheme + host + port
- By default, Web Browsers deny cross-origin requests
- Destination but allow requests, CORS HEADERS Access-Control-Allow-Origin, Access-Control-Allow-Methods
Amazon S3 - MFA Delete
- Required: Permanently delete an object, or disable Versioning. Only bucket owner and root account can disable MFA delete
- aws configure --profile NAME-OF-PROFILE
- aws s3api put-bucket-versioning --bucket NAME-OF-BUCKET --versioning-configuration Status=Enabled,MFADelete=Enabled --mfa "ARN-OF-MFA-DEVICE MFA-CODE" --profile NAME-OF-PROFILE
S3 Access Logs
- Can be analyzed using Amazon Athena
- DO NOT set logging bucket to same as monitoring bucket!!!
  - Leads to a logging loop!!!
S3 Pre-Signed URLs
- URL Expiration (1min to 720 min in Console; 7 days max in AWS CLI/SDK)
S3 Glacier Vault Lock
- it's like a bucket where you can delete shit (WORM - Write Once Read Many)
- Vault Lock Policy
- Helpful for compliance
S3 Object Lock - must enable versioning; blocks version deletion
- Retention Modes
  - Compliance => no one can delete shit
  - Governance => admins/root can delete shit
- Retention Period - Protect object for a fixed amount of time; can be extended
- Legal Hold
  - s3:PutObjectLegalHold IAM Permission
  - protect object indefinitely; independent from retention period
S3 Access Points
- each AP has its own DNS name (Internet Origin or VPC Origin)
- AP Policy - 1:1 Manage Security at Scale
S3 Object Lambda
- allows AWS Lambda Functions to change object before retrieved by caller

Section 14: Cloudfront & AWS Global Accelerator

AWS CloudFront
- Content Delivery Network - CDN
- Improves read performance, content is cached at the edge
- 216 Points of Presence
- DDoS protection, Integration with Shield, AWS Web Application Firewall
- Secured with Origin Access Control - OAC versus Origin Access Identity OAI
- CloudFront with ALB or EC2 as Origin
  - ALB/EC2 Instances must be Public
    - Allow Public IP of Edge Locations
- CloudFront Geo Restriction
  - Can restrict access to distribution
  - Allowlist or Blocklist - approved/banned countries
- Pricing
  - Cost varies by location
- Price Classes
  - Price Class All: all regions - best performance
  - Price Class 200: most regions, except pricey
  - Price Class 100: NA/Europe/Israel
- Cache Invalidation
AWS Global Accelerator
- uses Anycast IP instead of Unicast IP - client routed to nearest server
- route clients to closest Edge location via the internal AWS private network
- Uses 2 Anycast IP
- Works with:
  - Elastic IP
  - EC2 Instances
  - ALB, NLB (public or private)
- Health Checks built in (less than 1 min failover)
CloudFront vs Global Accelerator
- Improves TCP/UDP app performance via proxying packets
- Good for HTTP use cases that require static IP or fast failover

Section 15: AWS Storage Extras

AWS Snow Family
- Data Migration
  - Snowcone
    - 8TB Storage, up to 24TB
    - Can use AWS DataSync to send data via www
  - Snowball Edge
    - Storage Optimized
      - 80TB of HDD
      - Can cluster ... up to 15 snowballs
    - Compute Optimized
      - 42TB of HDD
  - Snowmobile
    - 1 Exabyte = 1000 PB transfers
    - Each mobile has 100PB
    - Better than Snowball if >10PB
- Edge Computing
  - Snowcone
  - Snowball Edge
All can run EC2 Instances, AWS Lambda via AWS IoT Greengrass
Rule of thumb: use snowball devices if >1 week to transfer
AWS OpsHub GUI for controlling snow family devices -Snowball -> S3 -> Lifecycle -> S3 Glacier
Amazon FSx
- Launch 3rd party HP FS on AWS
- Fully managed
  - FSx for Windows File Server
    - fully managed Windows FS shared drive
    - Supports SMB protocol and Windows NTFS
    - Integration with Microsoft AD, ACLs, and user quotas
    - Can be mounted on Linux EC2 instances
    - Supports Microsoft's Distributed File System (DFS) Namespaces - link on-premise Windows FS to Cloud
  - Storage Options:
    - SSD
    - HDD
  - Can access Windows FSx from on-premise with VPN or Direct Connect
  - Can be MultiAZ; Data backed up daily to S3 for DR
  - FSx for Lustre = Linux + cluster
    - Parallel distributed FS for large-scale computing
    - Machine Learning + HPC (High Performance Computing)
    - Seamless integration with S3. Can "read" as FS through FSx and write
    - Can be used from on-premise via (VPN or Direct Connect)
    - Scratch File System
    - Persistent File System (replication within single AZ)
  - FSx for NetApp ONTAP
    - NFS, SMB, iSCSI
    - Broad compatibility (Workspaces, VWWare Cloud on AWS, AppStream 2.0)
    - Storage auto-shrinks or grows
    - snapshots
    - replication
    - data compression and de-duplication
    - Point-in-time instantaneous cloning (helpful for testing new workloads)
  - FSx for OpenZFS
    - compatible with NFS
    - Broad compatibility
    - point-in-time instantaneous cloning
    - Up to 1 million iops, sub ms latency
    - Snapshots and compression, low cost
AWS Storage Gateway - bridge between on-premise and cloud data
- Block Storage
  - EBS
  - EC2 Instance Store
- File Storage
  - EFS
  - FSx
- Object Storage
  - S3
  - Amazon Glacier
- Use Cases:
  - DR
  - backup & restore
- Types:
  - S3 File Gateway
    - NFS or SMB ... behind the scenes uses HTTPS
    - Most recent used files cached in file gateway
    - SMB allows for AD for user auth
  - FSx File Gateway
    - Native access to Amazon FSx for Windows File Server
    - Advantage is the local cache
    - Good for group file shares and home dirs
  - Volume Gateway
    - block storage uses iSCSI backed by S3
    - Backed by EBS to restore on-premises volumes
    - Cache volumes - low lat
    - Stored volumes - all dataset is on premise, for backup
    - mainly for backup and restore
  - Tape Gateway
    - for physical tape backups in the cloud
    - S3 or Glacier
    - also can use iSCSI
  - Storage Gateway - Hardware Appliance
    - If you can virtual the gateway
AWS Transfer Family
- uses FTP, FTPS, SFTP
- can transfer to S3 or EFS
- Can used Microsoft AD, LDAP, Okta, Amazon Cognito for authentication
AWS DataSync
- Move large amount of data to/from
- On-premise to cloud and vice versa ... needs an agent
- AWS to AWS ( o agent)
- Replication task is not sync; it is scheduled: hourly, daily, weekly...
- File permissions and metadata are preserved (NFS POSIX, SMB)
- Can sync with ALL S3 (including glacier), EFX or FSx console.log('including:', including)
Summary:
- EC2 Instance storage: physical storage with high IOPS!!!

Section 16: Decoupling Applications: SQS, SNS, Kinesis, Active MQ

Sync communications vs Async/Event-Based communication between services
SQS - Simple Queueing Service - Queue Model
- queue model: queue/messages/poll/long polling/Producers/Consumers
- used to decouple applications
- retention time: 4 days, max 14 days
- low latency (<10ms)
- 256KB message limit
- At least once delivery, "best effort ordering" by default
  - SendMessage API, message persisted until Consumer deletes message
- unlimited throughput (in standard configuration)
- Polling up to 10 messages at a time. DeleteMessage API, ReceiveMessages
- Scale Consumers using ASG and CloudWatch Metric (ApproximateNumberOfMessages) -> setup a CloudWatch Alarm
- Security:
  - in-flight with HTTPS API, at rest with KMS keys, or client-side encryption
  - Access Controls via IAM policies or SQS Access Policies (cross-account)
- Messages become invisible to other consumers once polled - set message visibility timeout - default 30 seconds --- must be processed and deleted by Consumer or message will return to the queue
  - ChangeMessageVisibility API can give a Consumer more time to process
SNS - Simple Notification Service - Pub/Sub model
- pub/sub model
Kinesis - Real-Time Streaming mode

cesarnml/aws-solutions-architect-associate-ssa-c03

AWS Solutions Certified Architect Associate Course by Stephane Maarek

Lessons

Notes

Section 01: Introduction

Section 02: Getting Started with AWS

Section 03: IAM

Section 04: EC2 Fundamentals

Section 05: EC2 Solutions Architect Level

Section 06: EC2 Instance Storage

Section 07: High Availability and Scalability

Section 08: AWS Fundamentals: RDS + Aurora + ElastiCache

Section 09: Route 53

Section 10: Classic Solutions Architecture Discussions

Section 11: Amazon S3 Introduction

Section 12: Advanced Amazon S3

Section 13: Amazon S3 Security

Section 14: Cloudfront & AWS Global Accelerator

Section 15: AWS Storage Extras

Section 16: Decoupling Applications: SQS, SNS, Kinesis, Active MQ