Solutions Architect Associate - Study Notes
Exam Tips
Associate everything learned to a WAF pillar
If a solution seems feasibale but highly complicated it's probably wrong
Don't overthink it
50% off next exam if pass
General Architecture
Regions and AZs
Region us-east-1
AZ us-east-1a-f
Consoles are region scoped (aside IAM, S3, and Route 53)
Global Infrastructure to see AZ # and definitions
Service Models
IaaS, PaaS, SaaS, FaaS (Function as a Service)
The service stack responsibility differs depending on the service model (Data centre, Network and Storage, Host/Servers, Virtualization, OS, Runtime, Application, Data)
High Availability (HA) vs. Fault Tolerance
HA - Hardware, Software, and configuration allowing a system to recover quickly in the event of a failure (minimize downtime, not to prevent the failure to begin with)
Fault Tolerance - System designed to operate through a failure with no user impact.
RPO vs. RTO
RPO - How much a business can tolerate to lose, expressed in time between failure and backup.
RTO - Maximum time a system can be down, time to recover.
Scaling
Vertical Scaling - Increase size of server, maximum machine sizes will constrain (technically or cost wise)
Horizontal Scaling - Additional machines into a pool of resources, requires application support.
Tired Application Design
Presentation - interacts with customer
Logic - delivers application functionality
Data - data storage and retrieval
Monolithic application require vertical scaling
Misc.
Cost efficient or cost effective - Implementing for as little initial and ongoing cost
Application Session State - represents what a customer is doing, have chosen, or configured.
Undifferentiated Heavy Lifting - A part of an application, system, or platform that is not specific to your business.
Accounts
Budgets and Cost
Solution Architecture
Instantiating instances quickly
Golden AMI: Apps, dependencies, etc. done beforehand
User Data: For dynamic configuration (retrieving un/pw or something)
Hybrid: mix Golden and User Data (Elastic Beanstalk)
RDS: Restore from snapshot, DB will have schemas and data ready
EBS Volumes: restore from snapshot, will already be formatted and have data
Elastic Beanstalk
Single Instance deployment: Good for dev
LB + ASG: good for prod, pre-prod
ASG only: Good for non-web apps in productions (workers etc.)
Three components
Application
Application version
Environment name
Can promote versions to next env
Rollback feature to previous version
Full control over lifecycle of envs
Support for most platforms (can write own custom platform too)
Well-Architected Framework (WAF)
Read WAF whitepaper
Re-read WAF notes from internal training
When going through course align everything with a WAF pillar
Pillars, Design Principles, Questions
Security
IAM
Global across all Regions
Account Aliases must be globally unique
Authentication and Authorization
Principal - Person or application that can make an authenticated or anonymous request to perform an action on a system
Authentication - Process of authenticating a principal against an identity
Identity - Objects that require authentication and are authorized to access resources
Authorization - Process of checking and allowing or denying access to a resource for an identity
Users
One user per physical person
chmod 0400 on .pem key file
(Windows 10 SSH) Properties - > Security - > (make self owner) - > remove Inheritance - > remove all other users - > ensure Full Control
Groups
Roles
Internal use, machine use only?
One role per application, no sharing
Policies
Written in [JSON]
Compute
EC2
Exam Tips
Billed by the second
Windows 10 can use SSH
SG can have IPs as rules, but also reference other SG for rules
Instance
Has public IP by default, likely change on restart
User Data
Commands automatically run with sudo
Runs as root
Runs first time system is run only
Gets base64 encoded and passed
AMI
Region specific (but can copy)
Cross account AMI copy
You can share an AMI with another AWS account
Sharing an AMI does not affect ownership of the AMI
If you copy an AMI that has been shared with your account, you are the owner of the target AMI in your account
To copy an AMI that was shared from another account the source owner must grant you read permissions for the storage that backs the AMI (EBS snapshot or S3 bucket for instance store backed)
Limits:
Can't copy encrypted shared AMI. If the underlying snapshot and encryption key were shared you can copy while re-encrypting it with own key. You own the copied snapshot and register it as new AMI.
Can't copy a shared AMI with an associated billingProduct code, including Windows and Marketplace AMIs. To copy launch an EC2 instance using the shared AMI then create an AMI from the instance.
Reside in S3 (cost based on storage used)
Use custom AMI for faster deploy in ASG
EC2 Instance Launch Types
On Demand Instances
For: Short-term uninterruptable workloads when cannot predict application behaviour
Pay per use, billing per second after first minute
Highest cost, no upfront payment or commitment
Reserved Instances
For: Steady state usage (think database)
Up to 75% discount vs OD
Pay upfront for use, long term commitment, 1 or 3 years
Reserve specific instance type
Convertible Reserved Instance
Can change EC2 instance type
Up to 54% discount
Scheduled Reserved Instance
Launch within the time window you reserve (at regular interval)
Spot Instances
For: Batch jobs, Big Data analysis, failure resilient workloads
Discount up to 90% vs OD
Active as long as under bid price
Price varies on supply and demand
Reclaimed with 2 min warning when spot price goes above bid
Hardware dedicated to you
May share hardware with other instances in same account that are not Dedicated Instances
No control over instance placement

Instance Types
R: RAM - ex: in-memory cache
C: CPU - ex: compute/database
M: Balanced (Medium)- ex: general/web app
I: I/O (instance storage) - ex: databases
G: GPU - ex: video rendering or machine learning
Burstable (T2/T3)
Ok CPU, can burst to *good CPU
Burst uses burst credits
If all credits used, CPU becomes bad
When not bursting accumulates burst credits
Can pay for unlimited burstable mode
Placement Groups
Cluster - Low latency, single AZ
Same rack, same AZ, 10GB Network, same failure zone
Spread - Spreads across underlying hardware, and across AZs (max 7 instances per group, per AZ)(critical applications, maximum HA)
Partition - Spreads across many partitions (which rely on different racks)within an AZ. Scales to 100's of instance per group (ex: Hadoop, Cassandra, Kafka)
Partition is a set of racks, can create up to 7 partitions in PG
Each partition has many instances, partition is same failure zone
Partition failure will not affect other
EC2 instances can get access to partition metadata
EC2 Instance Metadata
Ability to learn about one's self without using an IAM role
URL is http://169.254.169.254/latest/meta-data
Can retrieve IAM Role name from metadata, but not the IAM Policy
When querying curl http://169.254.169.254/latest/iam/security-credentials/myfirstrole
Get AcessKeyID, Secret, and Token, which is what the EC2 instance gets via the IAM Role to access whatever
Short lived
Storage
S3
Bucket names must be globally unique
Global at top menu, (but regional service)
Minimum of 3 and maximum of 63 characters - no uppercase or underscores
Must start with a lowercase letter or number and can’t be formatted as an IP address (1.1.1.1)
Default of 100 buckets per account, and hard 1000 bucket limit via support request
Unlimited objects in buckets
Unlimited total capacity for a bucket
An object’s key is its name (FULL PATH including slashes and filename, but not bucket name)
An object’s value is its data (content)
An object’s size is from 0kb to 5TB (more than 5GB must use multi-part upload)
To upload a file larger than 160 GB, use the AWS CLI, AWS SDK, or Amazon S3 REST API.
Metadata (list of key/value pairs, system or user metadata)
Tags (Unicode key/value pair -max 10-), useful for security / lifecycle
Version ID (if versioning is enabled)
Versioning
Bucket level setting
If you overwrite a key/file you increment its version
Best practice to version your buckets
Protect against unintended deletes
Easy roll back to previous version
Any file that is not versioned prior to enabling versioning will have a version NULL
Deleting a file only adds a delete marker
S3 Websites
URL can be
.s3-website-.amazonaws.com
.s3-website..amazonaws.com
S3 CORS
If you request data from another S3 bucket you need to enable CORS
Cross Origin Resource Sharing allows you to limit the number of websites that can request files in your S3 (help limit costs)
Access-Control-Allow-Origin:
S3 Consistency Model
Amazon S3 provides read-after-write consistency for PUTS of new objects in your S3 bucket in all regions with one caveat. The caveat is that if you make a HEAD or GET request to the key name (to find if the object exists) before creating the object, Amazon S3 provides eventual consistency for read-after-write.
S3 Security
User based
IAM Policies - which API calls should be allowed for a specific user from IAM
Resource Based
Bucket Policies - bucket wide rules from the S3 console - allows cross account
Object ACLs - finer grain, not super popular
Bucket ACLs - less common
S3 Bucket Policies
Grant public access to the bucket
Force objects to be encrypted at upload
Grant access to another account (Cross account)
JSON based (4 components)
Resources: buckets and objects
Actions: Set of APIs to Allow or Deny
Effect: Allow or Deny
Principal: The account of user to apply the policy to
Networking: Supports VPC endpoints (for instances in VPC with no internet)
Logging and Auditing: S3 access logs can be stored in another bucket, API calls can be logged in CloudTrail
User Security: MFA can be required in versioned buckets to delete objects, Signed URLs = valid for a limited time (ex: premium video service for time)
S3 Encryption for Objects
Can also set default encryption for bucket
SSE-S3
Keys handled and managed by AWS S3
Object is encrypted server side, sent via HTTP/S
AES-256
Must set header: "x-amz-server-side-encryption":"AES256"
S3 Managed Data Key + Object > Encrypted
SSE-KMS
Keys handled and managed by KMS
Object is encrypted server side, sent via HTTP/S
KMS advantages: user control (rotation etc.) + audit trail
Must set header: "x-amz-server-side-encryption":"aws:kms"
KMS Customer Master Key (CMS) + Object > Encrypted
SSE-C
Server Side encryption using keys fully managed by customer outside AWS
S3 does not store the key
HTTPS must be used
Encryption key is provided (sent) in HTTP header, in every request
Client provided data key + Object > Encrypted, S3 throws away key
Client Side Encryption
Client library such as Amazon S3 Encryption Client
Clients must encrypt data themselves before sending to S3
Client must decrypt data themselves when retrieving from S3
Customer fully manages the keys and encryption cycle
Encryption in Transit
AWS S3 exposes both HTTP and HTTPS endpoints, HTTPS recommended
Default Encryption vs Bucket Policies
Old way was to use bucket policies to enable and to refuse any HTTP command without proper headers
New way is to click "default encryption" option in S3
Bucket Policies are evaluated before default encryption
Either SSE-S3 (AES-256) or SSE-KMS
S3 MFA Delete
To use MFA-Delete must enable Versioning on the S3 bucket
You need MFA to
permanently delete an object version
suspend versioning on the bucket
You won't need it for
enabling versioning
listing deleted versions
Only bucket owner (root account) can enable/disable MFA-delete
Can only be enabled using the CLI
S3 Access Logs
Any request made to S3 from any account, authorized or denied, will be logged to another S3 bucket
Can analyze using data analysis tools (Hive, Athena, etc.)
Log format in docs
S3 Cross Region Replication
Must enable versioning (source and destination)
Must be in different regions (duh)
Can be different accounts
Copying is asynchronous
Must give proper IAM permissions to S3, needs Role
For:
Compliance, lower latency access, cross account replication
Can do based on whole bucket, prefix, tags
Can replicate encrypted if other account has access to KMS key
Can change storage class or ownership
S3 Pre-signed URLs
Can create a pre-signed URL via CLI or SDK
For downloads CLI
For uploads SDK
Valid by default for 3600 seconds, change with --expires-in [TIME_BY_SECONDS]
Users who receive pre-signed URL inherit permissions of the generator for GET/PUT
aws configure set default.s3.signature_version s3v4
aws s3 presign s3://bucketname/file.jpg --expires-in 300 --region ca-central-1
Avoids direct access to the bucket from users
S3 Storage Tiers
S3 Standard - General Purpose
99.999999999% Durability (10 mil objects 10k years, lose 1)
99.99% availability
Can sustain 2 concurrent AZ loss
S3 Reduced Redundancy Storage (RRS)
Deprecated
99.99% durability and availability
Can sustain loss of single AZ
Use for non-critical reproducible data
S3 Standard Infrequent Access (IA)
Suitable for data less frequently access but requires rapid retrieval
Retrieval fee
99.999999999% Durability (10 mil objects 10k years, lose 1)
99.99% availability
Can sustain 2 concurrent AZ loss
For backups, DR, etc.
S3 One Zone Infrequent Access
Same as IA, but data is stored in a single AZ
Retrieval fee
99.999999999% Durability; data is lost when AZ is destroyed
99.95% availability
Lower cost by 20% than IA
For secondary backup data, or recreatable
S3 Intelligent Tiering
Small monthly auto-tiering fee
Move between S3 and IA based on access patterns
99.999999999% Durability, 99.9% availability
Can sustain single AZ loss
S3 Glacier
Alternative to Tape (10's of years)
99.999999999% Durability
Cost per estorage per month ($0.004 / GB) + Retrieval fee
Each item is called an "Archive", up to 40TB size
ARchives are stored in "Vaults", similar to bucket
Retrieval options:
Expedited (1-5 mins) - $0.03 / GB and $0.01 per request
Standard (3-5 hours) - $0.01 per GB and 0.05 per 1000 requests
Bulk (5-12 hours) - $0.0025 per GB and $0.025 per 1000 requests

S3 Lifecycle Rules
Transition Actions: Defines when objects are transitioned to another storage class
Expiration Actions: Objects expire and are deleted
Can be used to delete incomplete multi-part uploads
Limit to prefix or tag
Can do current or previous versions
Snowball
Physically transport data in or out of AWS
TB or PB
Alternative to network fees
Secure, tamper resistant, uses KMS 256
Tracking using SNS and text messages, E-Ink shipping label
For: large data migrations, DC decommission, disaster recovery
If it takes more than a week via network use Snowball instead
Has client for copying files
Snowball Edge
Adds computational capability
100TB capacity, either:
Storage Optimized - 24 vCPU
Compute Optimized - 52 vCPU & optional GPU
Supports a custom EC2 AMI so you can process while transferring
Supports custom Lambda functions
AWS Snowmobile
Transfer exabytes (1EB = 1000PB = 1000000TB)
Each has 100PB of capacity, can use multiple in parallel
Use if transferring more than 10PB
Storage Gateway
Expose S3 on-premises
File Gateway
S3 buckets via NFS and SMB (all S3 modes)
Bucket access using IAM roles for each File Gateway
Recently used data is cached
Can be mounted on many servers
Volume Gateway
Block storage using iSCSI backed by S3
^ Backed by EBS snapshots
Cached volumes: low latency access to most recent data
Stored volumes: entire dataset is on-premises, scheduled backups to S3
Tape Gateway
VTL Virtual Tape Library backed by S3 and Glacier
Back up data using existing tape based processes (and iSCSI interface)
Works with most backup softwares
EBS
EBS volumes are AZ locked
Can migrate via snapshot and recreate
EBS backups use IO and shouldn't run during peak times
Root instances of EBS volumes get terminated with instance by default (can disable)
Disk IO is high - Increase EBS volume size (for gp2)
Size | Throughput | IOPS
GP2 (SSD): General purpose SSD (balance price/perf)
Boot volumes, virtual desktops, low-latency interactive apps, development and test
1GB-16TB
Small GP2 can burst IOPS to 3000 (anything under 3k can burst to 3k)
Max IOPS is 16000
3 IOPS per GB, means at 5334 GB at max IOPS
IO1 (SSD): Highest-perf, low latency or high-throughput
Critical business apps that require sustained IOPS, or more than 16000
Mongo, Cassandra, MSSQL, MySQL, Oracle
4GB-16TG
IOPS is provisioned 100-64000 (64k for Nitro only) else 100-32000
Maximum ratio of provisioned IOPS to volume GB size = 50:1
ST1 (HDD): Low cost for frequently accessed, throughput-intensive workloads (big data)
Streaming workloads requiring consistent, fast throughput at low price
Big Data, DW, log processing, Kafka
Cannot be boot volume
500GB - 16TB
Max IOPS is 500
Max throughput of 500 MB/s, can burst
SCI (HDD): Lowest cost for less frequently accessed workloads
Throughput oriented for large volumes of data infrequently accessed
Where lowest cost is important
Cannot be a boot volume
500Gb - 16TG
Max IOPS is 250
Max throughput of 250 MB/s, can burst
Only GP2 and IO1 can be boot volumes
EC2 machine loses its root volume when terminated
Store non-ephemeral data on EBS volume, network drive (not physical) you can attach or detach while running. Persist data.
Locked to AZ
Can move via snapshot
Have a provisioned capacity (billed for all capacity)
Can dynamically increase capacity over time, start small
EBS Snapshots
Incremental - only changed blocks
EBS backups use IO, should not run them during peak times
Snapshots are stored in S3 (but you won't see them)
Don't have to detach volume but recommended
Max 100000 snapshots
Can copy across AZ or Region
Can make AMI from Snapshot
EBS volumes restored by snapshots need to be pre-warmed (using fio or dd to read entire volume)
Can be automated using Amazon Data Lifecycle Manager
EBS Migration
Volumes locked to AZ
To migrate, snapshot, (optional) copy volume to different region
Create a volume from the snapshot in the AZ of your choice
EBS Encryption
When you encrypt an EBS volume you get:
Data at rest is encrypted inside the volume
Data in flight between instance and the volume is encrypted
Snapshots are encrypted
As are volumes created from the snapshot
Encryption and decryption are transparent
Minimal impact on latency
EBS Encryption leverages keys from KMS (AES-256)
Copying an unencrypted snapshot allows encryption
Snapshots of encrypted volumes are encrypted
Encrypting an undecrypted EBS volume
Create an EBS snapshot of the volume
Encrypt the snapshot using copy
Create a new volume from the snapshot
Attach encrypted volume to original instance
EBS RAID
EBS is already redundant (replicated within an AZ)
But for increase of IOPS past max
Must do in OS not AWS
Or mirror EBS volumes
RAID 0 (Perf, get combined disk space, IO, throughput, not fault tolerant)
RAID 1 (mirror, send data to two* volumes at same time, 2x network traffic)
RAID 5, 6 (Not recommended for EBS)
EFS
Managed NFS
EFS works with EC2 instances multi-AZ
Highly available, scalable, expensive (3xGP2), pay per use
For: content management, web serving, data sharing, WordPress
NFS v4.1
Use security groups to control access
Compatible with Linux based AMI (not windows)
Performance mode: General purpose (default), Max IO (used when 1000's of EC2 are using the EFS)
Has bursting or provisioned modes for IO
"EFS file sync" to sync from on-prem fs to EFS
Backup EFS-to-EFS (incremental, can choose frequency)
Encryption at rest using KMS
EFS now has lifecycle mgmt. to tier to EFS IA
Instance store
Some instance do not come with root EBS
Ephemeral
Physically attached to your instance
Pros
Better I/O perf
Good for buffer / cache / scratch data / temporary content
Data survives reboot
On stop or termination instance store is lost
Can't resize the instance store
Backups must be operated by the user
Networking
Elastic IP public static IPv4 attachable to one instance
Horizontal scalability = elasticity
Vertical scalability (RDS, Elasticache)
HA means running your application in 2 DC/AZ
Load Balancing
Health Checks
Done on port and route
Any LB has a static hostname, use it and not IP
LB can scale, not instant, contact AWS for a warm-up
4xx errors are client induced errors
5xx errors are application induced errors
LB 503 errors means at capacity or no registered target
If LB can't connect to app, check SG!
Seamlessly handle failures of downstream instances
Health checks (clb? 200 ok, otherwise not)
CLB + ALB support SSL Certificates and provide SSL termination for websites (NLB can terminate, Jan 2019)
Enforce stickiness
HA across AZs
Separate public traffic from private traffic
Exposes single point of access (DNS) to your app
Network Load Balancers expose a public static IP, whereas an Application or Classic Load Balancer exposes a static DNS (URL)
ELB - Managed load balancer
Classic LB (v1, 2009)
Deprecated
Application Load Balancer (v2, 2016)
Layer 7 (HTTP/S, WebSockets)
LB to multiple applications on same machine
LB to target group based on route in URL
LB to target group based on hostname in URL
LB to target group based on client IP
Supports dynamic host port mapping with ECS (redirect to same machine)
Before would have had to have one CLB per app
Stickiness at target group level (same instance)
Cookie generated by ALB
App server does not see IP of client directly, inserted in X-Forwarded-For
Also port via X-Forwarded_Port, and proto via X-Forwarded-Proto
Does do connection termination to do this
Great fit for ECS/Containers
Network Load Balancer (v2, 2017)
TCP (Layer 4)
High perf, millions of requests per sec
Support static / elastic IP (per AZ), public must be elastic (can help whitelist by clients), private facing will get random private IP based on free ones at the time
Has cross zone balancing
Has SSL termination (Jan 2019)
Less latency ~100ms (vs 400ms for ALB)
Only for extreme perf, not default
NLB see client IP
Can have internal or external ELB
LB Stickiness, enabled in Target Groups
Stickiness works for CLB and ALB
Works with cookies, has an expiration date
Make sure user doesn't lose session data
Can bring imbalance over backend instances
Exam can ask if one instance is 80% and one 20% why that would be
Stickiness duration can be 1 sec to 7 days
LB SSL Certificates
LB uses x.509 certificate (SSL/TLS server cert) loaded on LB
Can manage certificates using ACM (AWS Certificate Manager)
Can create or upload your own certs alternatively
HTTPS listener
Must specify default certificate
Can add an optional list of certs to support multiple domains
SNI (Server Name Indication) is a feature allowing you to expose multiple SSL certs if the client supports it.
Auto-Scaling Groups (ASG)
A launch configuration
AMI + Instance Type
EC2 User Data
EBS Volumes
Security Groups
SSH Key Pair
Min/Max/Initial Capacity size
Network + Subnet information
Load Balancer Information
Scaling Policies (triggers)
Possible to scale in/out based on CloudWatch alarm
Alarm monitors a metric
Metrics are computed for the overall ASG instances
ex: Target average CPU
ex: Average network in or out
Can scale on custom metric (ex: connected users)
Send custom metric from app on EC2 to CloudWatch (PutMetric API)
Create alarm to react based on low / high values
Use the alarm as scaling policy for ASG
IAM roles attached to an ASG will get assigned to EC2 instances
ASG are free, pay only for instances
ASG can terminate instances marked unhealthy by a LB and replace them
Available Metrics:
ASGAverageCPUUtilization—Average CPU utilization of the Auto Scaling group.
ASGAverageNetworkIn—Average number of bytes received on all network interfaces by the Auto Scaling group.
ASGAverageNetworkOut—Average number of bytes sent out on all network interfaces by the Auto Scaling group.
ALBRequestCountPerTarget—Number of requests completed per target in an Application Load Balancer target group.
Default Termination Policy for ASG. It tries to balance across AZ first, and then delete based on the age of the launch configuration.
Scaling Cooldown, makes sure doesn't get out of control, no other scaling takes effect until cooldown is over. Can override default cooldown.
Can have default cooldown, but also policy specific to simple scaling policy. Good for scale-in that terminates instances, doesn't take much time.
Reduce costs by lowering cooldown from ex: 300 to 180.
If your app is scaling multiple times per hour, modify ASG cool-down timer and the CloudWatch Alarm Period that triggers the scale-in
Security Groups
Inbound traffic is blocked by default, outbound is authorised
Can be attached to multiple instances, and instances can have multiple security groups
Locked to a region/VPC combination
Best practice use one just for SSH
If your application timeouts it's the SG
Can reference security group for access
Databases
RDS
Postgres
Oracle
MySQL
MariaDB
MS SQL
Aurora (proprietary)
DB Identifier (name) must be unique across region
Your responsibility
Check IP / Port / SG inbound rules
In-database user creation and permissions
Creating database with or without public access
Ensure parameter groups or DB is configured to only allow SSL
AWS Responsibility
No SSH access
No manual DB patching
No Manual OS patching
No way to audit underlying instance
For SAs
Read replicas can only do SELECT
RDS supports Transparent Data Encryption for Oracle or SQL Server
Is on top of KMS, may affect performance
IAM Authentication vs un/pw for MySQL and PostgreSQL
Lifespan of an IAM authentication token is 15 mins (short-lived), better security
Tokens are generated by IAM credentials
SSL must be used (or connection refused)
Easy to use EC2 Instance Roles to connect to RDS DB (so don't need DB credentials in actual instance for non IAM)
Managed Service =
OS patching
Point in Time Restore backups
Monitoring dashboards
Read replicas for read perf
Multi AZ set for DR
Maintenance windows for upgrades
Scaling (vert and horiz)
BUT no SSH
RDS Read Replicas for scalability
Up to 5 Read Replicas
Within AZ, Cross AZ, or Cross Region
Replication is ASYNC (eventually consistent)
Replicas can be promoted to their own DB
Applications must updated connection string to leverage read replicas
One string for master, 1 for each replica
Can combo Read Replicas and DR Multi AZ
RDS Multi AZ (Disaster Recovery)
SYNC replication
One DNS name for auto failover to standby
Increases availability (duh)
For AZ loss
No manual intervention
Not for scaling
RDS Backups
Automatically enabled
Automated Backups
Daily full snapshot of DB
Captures transaction logs in real
Ability to restore to any point in time
7 days retention (can increase to 35) (can lower as well)
DB Snapshots (can be manually triggered)
Retention for as long as you want (keep specific state, or long term)
RDS Encryption
Encryption at rest with AES KMS - AES256 encryption
Only at creation
or: snapshot, copy as encrypted, create DB from snapshot (same as EBS)
SSL certificates to encrypt data in flight
To enforce SSL:
PostgreSQL: rds.force_ssl=1 in the AWS RDS console (parameter groups)
MySQL: Within the DB: GRANT USAGE ON . TO 'mysqluser'@'%' REQUIRE SSL;
To connect using SSL:
Provide SSL Trust certificate (can be downloaded from AWS)
Provide SSL options when connecting to DB
RDS Security
RDS DB are usually deployed in private subnet
Security works by leveraging security groups for who can communicate with it
IAM policies help control who can manage RDS
Traditional username and password to log into DB itself
IAM users now works with Aurora/MySQL
RDS vs. Aurora
Proprietary
Postgres and MySQL drivers supported
Cloud optimized - 5x perf for MySQL, 3x perf for Postgres
Automatically grows in increments of 10GB up to 64TB
Aurora can have 15 replicas, MySQL only 5, and replication is faster (sub 10ms lag)
Failover in Aurora is instantaneous, HA native.
Aurora costs 20% more than RDS, but is more efficient.
Aurora
Automatic failover
Backup and recovery
Isolation and security
Industry compliance
Push-button scaling
Automated patching with zero downtime
Advanced monitoring
Routine maintenance
Backtrack: restore data at any point in time without backups
HA and Read Scaling
6 Copies of data across 3 AZ
4 copies out of 6 needed for writes
3 copies out of 6 needed for reads
Self healing with peer-to-peer replication (for corrupted data)
Storage is striped across 100's of volumes
One Aurora instance takes writes, Master
Automated failover for master in less than 30 secs
Master + up to 15 Read Replicas serve reads (any replica can become master)
Support for Cross Region Replication
Shared logical storage volume across AZs for Replication + Self-Healing + Auto Expanding
Master is only writer
Writer Endpoint (DNS name) always points to current master, for failover
Read Replicas can do auto-scaling
Reader Endpoint Connection load balancing for reads, across all scaled instances. Happens at connection level not statement level.

Aurora Security
Encryption at rest using KMS
Automated backups, snapshots and replicas are also encrypted
Encryption in flight using SSL (same process as MySQL or Postgres)
Authentication using IAM
You are responsible for protecting via SG
No SSH
Aurora Serverless
No need to choose an instance size
Only supports MySQL 5.6 & Postgres in beta
Helpful when you can't predict workload
DB cluster starts, shuts down, and scales automatically based on CPU / connections
Can migrate from Aurora Cluster to Serverless and vice versa
Serverless usage is measured in ACU (Aurora Capacity Units)
Billed in 5 minute increments of ACU
Some features aren't supported in serverless, so check docs
Aurora for SAs
Can use IAM for Aurora
Aurora Global Databases span multiple regions and enable DR
One primary region
One DR Region
The DR region can be used for lower latency reads
< 1 sec replication lag on average
If not using Global Databases you can create cross region Read Replicas
FAQ recommends Global Databases instead
Elasticache
Managed in-memory DB, high perf, low latency.
Redis or Memcached
Reduce load on DB
Make app stateless (keep state in cache)
Write scaling using Sharding
Read scaling using Read Replicas
Multi AZ with Failover
AWS takes care of all normal stuff
App queries ElastiCache, either gets cache hit or cache miss, in case of miss it gets cached for hit next time
Cache must come with invalidation strategy for only most current data (app based)
User session store (keep it stateless)
Application writes session data into ElastiCache
User hits a different application instance
Instance retrieves the data from cache to keep session going
Redis
In-memory key-value store
Super low latency (sub ms)
Cache survives reboot by default (persistence)
Multi AZ with automatic failover for DR (if you want to keep cache data)
Support for Read Replicas and Cluster
Good for: User sessions, Leaderboard (has a sort), Distributed states, Relive pressure on DB, Pub / Sub capability for messaging
Memcached
In-memory object store
Cache does not survive reboots
Good for: Quick object retrieval, cache often accessed objects
ElastiCache for SAs
Security
Redis supports RedisAUTH (un/pw)
SSL in-flight must be enabled and used
Memcached supports SASL
None support IAM
IAM policies are used only for AWS API level security
Patterns for ElastiCache
Lazy Loading: all read data is cached, can become stale
Write Through: Adds or updates data in the cache when written to DB (no stale data)
Session Store: stores temp session data (using TTL features maybe)
DynamoDB
Fully managed, Highly Available with replication across 3 AZs
Scales to massive workloads, distributed database
Millions of request per second, trillions of rows, 100s of TB of storage
Fast and consistent in performance (low retrieval latency)
Integrated with IAM for security, authorization, and administration
Enables event driven programming with DynamoDB Streams
Low cost and auto scaling
Basics
DynamoDB is made of tables
Each table has a primary key (must be decided at creation)
Each table can have an infinite number of items (=rows)
Each item has attributes* (can be add over time, can be null, =columns)
Maximum item size = 400KB
Data types supported are:
Scalar types: String, Number, Binary, Boolean, Null
Document types: List, Map
Set Types: String Set, Number Set, Binary Set
Table must be provisioned read and write capacity units
Read Capacity Units (RCU): throughput for reads ($0.00013 per RCU)
1 RCU = 1 strongly consistent read of 4KB per second
1 RCU = 2 eventually consistent read of 4KB per second
Write Capacity Unites (WCU): throughput for writes ($0.00065 per WCU)
1 WCU = 1 write of 1KB per second
Option to set up auto-scaling of throughput to meet demand
Throughput can be exceeded temporarily using "burst credit"
If burst credit are empty you'll get "ProvisionedThroughPutException"
Then do exponential back-off retry
DynamoDB - DAX
DynamoDB Accelerator
Seamless cache for DDB, no app re-write
WRites go through DAX to DynamoDB
Microsecond latency for cached reads and queries
Solves the Hot Key Problem (too many reads)
5 minute default TTL for cache
Up to 10 nodes in the cluster
Multi AZ (3 nodes minimum for production recommended)
Secure (Encryption at rest with KMS, VPC integration, IAM, CloudTrail, etc)
DynamoDB Streams
Changes in DynamoDB (Create, Update, Delete) can end up in a DynamoDB Stream
This stream can then be read by Lambda, then we can:
React to changes in real time (welcome email to new users)
Analytics
Insert into ElasticSearch
etc
Could implement cross region replication using Streams
Stream has 24 hours of data retention

New Features
Transactions
All or nothing type operations
Coordinated Insert, Update, Delete across multiple tables (all work or nothing)
Include up to 10 unique items, or up to 4MB data
On Demand
No capacity planning needed (WCU/RCU) - scales automatically
2.5x more expensive than provisioned
Helpful when spikes are un-predictable or the app is very low throughput
Security and Other
Security
VPC Endpoints, access without internet
Fully controlled by IAM
Encryption at rest with KMS, in transit with SSL/TLS
Backup and Restore available
Point in time like RDS
No performance impact
Global Tables (require Streams enabled)
Multi region, fully replicated, high performance
DMS can be used to migrate to DDB from Mongo, Oracle, S3, etc
Can launch local version of DDB for dev purposes
Athena
Serverless service to perform analytics directly against S3 files
Uses SQL to query
Has a JDBC / ODBC driver
Charged per query and amount of data scanned
Supports CSV, JSON, ORC, Avro, and Parquet
For: BI, analytics, reporting, analyze VPC vlow logs, ELB logs, CloudTrail trails, etc.
Route 53
Most common records
A: URL to IPv4
AAAA: URL to IPv6
CNAME: URL to URL (non root domain)
Alias: URL to AWS resource (root and non-root), free of charge, supports native health checks
Can use
Public domain names
Private domain names that can only be resolved by your VPC instances
$0.50 per hosted zone
Has
Load Balancing (through DNS, client LB)
Health checks (limited)
Routing policy: simple, failover, geolocation, latency, weighted, multi value
Simple Routing Policy
Maps a domain to one URL
Use when directing to a single resource
Cannot attach health checks
If multiple values are returned, a random one is chosen by client
Weighted Routing Policy
Control % of requests that go to specific endpoint (ex: 70, 20, 10. Sum does not have to be 100)
Create multiple record sets with weighted option
Helpful to test 1% of traffic on new app
Split traffic between regions
Can be associated with health checks
Latency Routing Policy
Redirect to server that has the least latency, close to request
Evaluated in terms of user to designated AWS region
Must specify region in latency record
Germany could be directed to US if lower latency
Route 53 Geolocation Policy
Different from latency based
Based on user location
Traffic from England should go to X
Must have a default policy if no other match exists
Multi Value Routing Policy
Use when routing traffic to multiple instances
When want to associate a Route 53 health check with records, removes unhealthy from returned values
Up to 8 healthy records are returned for each MultiValue query (even if you have 50)
MultiValue is not a substitute for using ELB
Route 53 Health Checks
Will not send traffic to if failed
Deemed unhealthy if checks fail 3 times
Deemed healthy if checks pass 3 times
Default interval 30 secs (can set fast health check at 10s, higher cost)
About 15 health checkers will launch to check endpoint health
one request every 2 secs on average
Can have HTTP, TCP, and HTTPS check (no SSL certificate verification)
Possibility of integrating health checks with CloudWatch
Health checks can be linked to Route 53 DNS record set
Route 53 as a Registrar
Offer both Registrar and DNS service
Developing on AWS
CLI
Never put personal credentials on EC2 machine, whole account compromised
Use Roles
Roles
Attached to EC2 instance
Come with policy authoring what instance is authorized for
Best practice
Instance can only have one role at a time
Policies
Permits and denies are specific APIs (GetObjet for Get*)
Inline Policies, added on top of, only for that role
AWS SDK
AWS CLI is a wrapper around Python SDK (boto3)
If you don't specify a region defaults to us-east-1
Recommended to use default credential provider chain
Works with:
AWS credentials in .aws (local or on-prem)
Instance Profile Credentials using IAM Roles for EC2 machines etc.
Environment variables (AWS_ACCESS_KEY_ID, etc.), not often used
NEVER STORE CREDENTIALS IN YOUR CODE, abstract
Always use IAM Roles when working within AWS Services
Exponential Backoff
Any API that fails because of too many calls needs to be retrieved with Exponential Backoff
These apply to rate limited APIs
The retry mechanism includ3d in SDK API calls
1 ms, 2 ms, 4ms, 8ms
CloudFormation
We didn’t specify a name in the json file for this bucket, so AWS names it with the [STACKNAME]-[LOGICAL_VOLUME_NAME]-[RANDOM_STRING] format.
What is logical volume name, based on resource in CFN?
Stacks have logical resources in them that create physical resources
CloudFront
Cached at edge locations
Popular with S3 but works with EC2 and LB as well
Helps with network attacks
Provides SSL (HTTPS) via ACM
Can use SSL (HTTPS) to talk internally to applications
Supports RTMP
Origin Access Identity
Limit S3 to be only accessed via this identity
CloudFront Signed URL / Signed Cookies
To distrbute paid shared content which lives in S3
If S3 can only be accessed via CloudFront we can't use self-signed S3 URLs
Can attach a policy with:
URL expiration
IP ranges for access
Trusted signers (which AWS Account can create signed URLs)
CloudFront signed URLs can only be created using the AWS SDK
Validity length?
Share content, movies etc, short = few minutes
Private content (to user) longer = years

CloudFront vs S3 Cross Region Replication
CloudFront
Global Edge network
Files are cached for a TTL (maybe a day)
Great for static content that must be available everywhere
S3 Cross Region Replication
Must be set up for each region
Files are updated near real-time
Read only
Great for dynamic content that needs low-latency in a few regions
CloudFront Geo Restriction
Restrict who can access your distribution
Whitelist by country
Blacklist by country
Country is determined by usnig 3rd party Geo-IP database
Copywrite law etc.
Messaging
General
Two patterns of application communication
Synchronous (app to app)
Problematic if there are suddent spikes of traffic
Asynchronous / Event Based (Queue)
Better to decouple (SQS: Queue, SNS: Pub/Sub, Kinesis: real-time streaming)
SQS (Super important)
SQS Standard Queue
Publisher -> Queue -> Consumer
Fully managed
Scales from 1 message per second to 10000s per second
Default retention: 4 days, maximum 14 days
No limit to how many messages in queue
Low latency (<10ms on publish and receive)
Horizontal scaling in terms of number of consumers
Can have duplicate messages (at least once delivery. Occasionally)
Can have out of order messages (best effort ordering)
Limitation of 256KB per message
SQS Delay Queue
Delay a message up to 15 minutes (consumers don't see it immediately)
Default os 0 second (available right away)
Can set a default at queue level
Can override the default using the DelaySeconds parameter, queue holds it
Producing Messages
Define Body (String up to 256KB)
Metadata, message attributes (optional) of Key Value pair, with Type
Provide Delay Delivery
Get Back
Message identifier
MD5 hash of the body
Consuming Messages
Poll SQS for messages (receive up to 10 at a time)
Process the message within the Visbility Timeout
Delete the message fro mthe queue using the message ID and receipt handle
Visibility Timeout
When a consumer polls a message from a queue the message is then "invisible" to other consumer for the defined Visibility Timeout perdiod
Set between 0 seconds and 12 hours (default 30 secs)
If too high (15 mins) and consumer fails to process, you have to wait a long time before retry
If too low (30 secs) and consumer needs more time to process another consumer will receive the message and it will be processed more than once
ChangeMessageVisibility API to change the visbility while processing a message, consumer can alert SQS it needs more time
DeleteMessage API to tell SQS the message was successfully processed
Dead Letter Queue
If a consumer fails to process a message within the Visibility Timeout it goes back to the queue
We can set a threshold of how many times a message can go back, it's called a redrive policy
After that threshold is exceeded the message goes into the Dead Letter Queue (DLQ)
We have to create a DLQ first, then designate it as a DLQ
We must make sure to process messages in the DLQ before they expire
Long Polling (Receive Message Wait Time)
When a consumer requests messages from the queue it can optionally "wait" for messages to arrive if there are none
LongPolling decreases the number of API calls made to SQS while increasing efficiency and latency of the app.
The wait time can be between 1 - 20 seconds, 20 preferable
Long Polling is preferred to Short Polling
Long Polling can be enabled at the queue level, or at the API level when making the poll via WaitTimeSeconds
FIFO Queue
Name of the queue must end in .fifo
Lower throughput (up to 3000 per sec with batching, 300/s without)
Messages are processed in order by the consumer
Messages are sent exactly once
No per message delay (only per queue delay)
Ability to content based de-duplication
5 minute interval de-duplication using "Duplication ID"
Message Groups:
Possibility to group messages for FIFO ordering using "Message GroupID"
Only one worker can be assigned per message group, so message are processed in order
Message group is just an extra tag on the message
SNS
Event producer only sends one message to the SNS topic
As many event receivers (subscriptions) as you want can listen to the SNS topic notifications
Each subscriber will get all the messages (new feature to filter messages)
Up to 10,000,000 subscriptions per topic
100,000 topic limit
Subscribers can be:
SQS
HTTP/S (with delivery retries)
Lambda
Emails
SMS messages
Mobile notifications
SNS Integrations
Some services can send data directly to SNS for notifications
CloudWatch for alarms
Auto Scaling Groups notifications
Amazon S3 on bucket events
CloudFormation upon state changes
etc
How to publish
Must process message right away, not stored in SNS Topic
Topic Publish (Within your AWS server using the SDK or CLI)
Create a topic
Create a subscription (or many)
Publish to the topic
Direct Publish (for mobile apps SDK) (Not on exam)
Create a platform application
Create a paltform endpoint
Publish to the platform endpoint
Work with Google GCM, Apply APNS, Amazon ADM
SNS + SQS - Fan Out
Push once in SNS, receive in many SQS
Fully decoupled
No data loss
Ability to add receivers of data later, flexible
SQS allows for delayed processing and retries of work (implying SNS does not)
Can have many workers on one queue and one worker on the other, or whatever

SNS Protocols
HTTP/S
Email
Email-JSON
Amazon SQS
AWS Lambda
Kinesis
Managed alternative to Kafka
Data is automatically replicated to 3 AZ
Great for application logs, metrics, IoT, clickstreams
Great for "real-time" big data
Great for real-time streaming processing frameworks (Spark, NiFi, etc)
Kinesis Streams (just plain Kinesis): low latency streaming ingest at scale
Kinesis Analytics: perform real-time analytics (filters, computations, alerting, etc) on streams using SQL
Kinesis Firehose: load streams into S3, Redshift, ElasticSearch, etc

Kinesis Streams (important)
Streams are divided in ordered Shards / Partitions
Data retention is 1 day by default, up to 7 days (24-168 hours)
Ability to reprocess / replay data (unlike SQS)
Multiple applications can consume the same stream (like SNS)
Real-time processing with a scale of through put (add more shards)
Once data in inserted into Kinesis it can't be deleted (immutability)
Think of a shard as a little queue
Kinesis is a highway, want to get the data to destination ASAP

Shards
One stream is made up of many different shards
Write: 1MB/s or 1000 messagess at write side PER SHARD
Read: 2MB/s at read side PER SHARD
Billing is per shard provisioned, can have as many as you want
Batching available for message push or for message calls
The number of shards can evolve over time (reshard / merge, essentially autoscaling)
Records are ordered per shard (SQS is unordered, fifo one queue, kineses in-between)
Kinesis API - Put records
On producer side
PutRecord API + partition key (any string) that gets hashed to determine shard id
The key is a way to route data to a specific shard
The same key goes to the same partition (data only goes to one shard at a time)
Messages sent get a sequence number
Choose a partition key that is highly distributed (helpes prevent "hot parition", overused shard)
Good user_id if many users
Bad country_id if most users are from same country
Use batching and PutRecords to reduce costs and increase throughput
ProvisionedThroughputExceeded if we go over the limits, then use Retries or ExponentialBackoff
Can use CLI, SDK, or producer libraries from various frameworks
Kinesis API - Exceptions
ProvisionedThroughputExceeded exceptions
Happens when sending too much data
Make sure you don't have a hot shard
Solution
Retries with backoff
Increase shards (scaling)
Ensure your partition key is a good one
Kinesis API - Consumers
Can use a normal consumer (CLI, SDK, etc)
Can use Kinesis Client Library (in java, Node, Python, Ruby, .Net)
KCL uses DynamoDB to checkpoint offsets
KCL uses DynamoDB to track other workers and share the work amongst shards (to improve efficiency)
Kinesis Security
Control access and authorization via IAM policies
In-Flight using HTTPS endpoints
At rest with KMS
Can encrypt/decrypt client side (difficult)
VPC endpoints available for Kinesis to access within VPC (no internet access)
Kinesis Data Analytics
Perform real-time analytics on Streams using SQL
Kinesis Data Analytics
Autoscaling
Managed
Continuous (real-time, no delay)
Pay for actual consumption rate
Can create new streams out of the real-time queries
Kinesis Firehose
Fully managed, no administration
Near real-time (perhaps 60 secs)
Load data into Redshift, S3, ElasticSearch, Splunk (ETL)
Autoscaling
Support for many data formats (but pay for conversion)
Pay for data going through, consumption model
SQS vs SNS vs Kinesis
Only one consumer per shard for Kinesis

Amazon MQ
SQS and SNS are cloud-native, using proprietary protocols from AWS
Traditional on-premises apps may use open protocols like: MQTT, AMQP, STOMP, Openwire, WSS
When migrating to cloud instead of re-engineering we can use Amazon MQ
Amazon MQ = managed Apache ActiveMQ
Amazon MQ doesn't scale as much
Runs on a dedicated machine, can run in HA multi-AZ
Has both a Queue feature (SQS) and topic feature (SNS)
Serverless
Just deploy functions (FaaS)
Lambda & Step Functions
DynamoDB
Cognito
API Gateway
S3
SNS & SQS
Kinesis
Aurora Serverless
Lambda
Virtual functions
Limited by time - short executions, when done, done
Run on-demand (run in ms)
Scaling is automated
Easy pricing
Pay per request and compute time
Free tier has 1,000,000 requests and 400,000 GBs of compute time
Integrated with whole AWS Stack
Integrated with many programming languages
Easy monitoring through AWS CloudWatch
Easy to get more resources for your functions (up to 3GB of ram)
Increasing RAM also improves CPU and network
Node.js (javascript), Python, Java (v8 compatible), C# (.NET Core), Golang, C# / Powershell
Main integrations
API GW
Kinesis
DynamoDB
S3
IoT
CloudWatch Events and Logs
SNS
Cognito
SQS
Pricing
Pay per *calls
First 1,000,000 are free
$0.20 per 1 million thereafter
Pay per duration (100ms increments)
400,000 GB-seconds of compute time free per month
== 400,000 seconds if function is 1GB RAM
== 3,200,000 seconds is function is 128MB RAM
After that $1.00 for 600,000 Gb-s
Lambda Configuration
Timeout: default of 3 secs, max of 900s (15min)
Environment variables
Allocated memory (128M to 3G)
Ability to deploy within a VPC and assign security groups
IAM execution role must be attached to the Lambda function
Limits (exam)
Execution
Memory allocation: 128MB - 3008 MB (in 64MB increments)
Maximum execution time: 300s (5 minutes), now 15 but exam assumes 5
Disk capacity in the "function container" (in /tmp): 512MB
Concurrency limits: 1000 (can service ticket)
Deployment:
Function deployment size (compressed .zip): 50MB
Uncompressed deployment (code+dependencies): 250MB
Can use /tmp dir to load other files at startup (for more than 250MB)
Size of environment variables: 4KB (therefore can't pass file)
Lambda @ Edge
Have a CloudFront CDN
@Edge allows you tu run global Lambda alongside
Or do request filtering before reaching application
Global as opposed to a region
More responsive apps
Customize CDN content
Pay per use
Use Lambda to change CloudFront requests and responses
After CloudFront receives a request from a viewer (viewer request)
Before CloudFront forwards the request to the origin (origin request)
After CloudFront receives the response from the origina (origin response)
Before CloudFront forwards the response to the viewer (viewer response)

You can also generate responses to viewers without ever sending the request to the origin

Use Cases
Website Security and Privacy
Dynamic Web Application at the Edge (see above pic)
SEO
Intelligently route across Origins and Data Centers
Bot mitigation at Edge
Real-time image transformation
A/B Testing
User authentication and authorization
User Prioritization
User Tracking and Analytics
API GW
AWS Lambda + API Gateway: No infra to manage
Handle API versioning (v1, v2, etc)
Handle different environments (dev, test, prod)
Handle security (Authentication and Authorization)
Create API keys, handle request throttling
Swagger / Open API import to quickly define APIs
Transform and validate requests and responses
Generate SDK and API specifications
Cache API responses
Stage variables allow you to modularize your stages, different for dev or prod for example
Integrations
Outside of VPC
Endpoints on EC2
Load Balancers
Any AWS service
External and publicly accessible HTTP endpoints
Inside of VPC
AWS Lambda in your VPC
EC2 endpoints in your VPC
Security (exam)
IAM Permissions
Create an IAM policy authorization and attach to application User/Role
API GW verifies IAM permissions passed by calling the application
Good to provide access within your own infra, but not for outside
Leverages Sig v4 capability where IAM credentials are in headers
Lambda/Custom Authorizer
Uses Lambda to validate the token passed in the header
Option to cache the results of authentication
Helps to use OAuth / SAML / 3rd party type of authentication
Lambda must return an IAM policy for the user

Cognito User Pools
Cognito fully manages user lifecycle
API GW verifies identity autmatically from AWS Cognito
No custom implementation required
Cognito only helps with authentication, not authorization

Summary
IAM
Great for users / roles already within your AWS account
Handle authentication + authorization
LEverages Sig v4
Custom Authorizer (Lambda)
Great for 3rd part tokens
Very flexible in terms of what IAM policy is returned
Handle authentication + authorization
Pay per Lambda invocation (but can cache to save calls)
Cognito User Pool
You manage your own user pool (non-IAM) (can be backed by Facebook, Google login, etc)
No need to write custom code
Must implement authorization on the backend
Cognito
Gives users an identity so that they can interact with our application
Cognito User Pools
Sign in functionality for app users
Integrate with API GW
Cognito Identity Pools (Federated Identity)
Provide AWS credentials to users so tehy can access AWS resources directly
Integrate with Cognito User Pools as an identity provider
Cognito Sync (being replaced by AppSync)
Synchronize data from device to Cognito
Cognito User Pools (CUP) (app authentication)
Create a serverless database of users for your mobile apps
Simple login: Username (or email) / password combination
Possibility to verify emails / phone number and add MFA
Can enable Federated Identities (Facebook, Google, SAML, etc)
Sends back a JSON Web Token (JWT)
Can be integrated with API GW for authentication

Cognito Federated Identity Pools (AWS IAM access)
Goal:
Provide direct access to AWS resources from the client side
How:
Log in to federated identity provider - or remain anonymous
Get temporary AWS credentials back from the Federated Identity Pool
These credentials come with a pre-defined IAM policy stating their permissions
Example:
Provide temporary access to write to a S3 buck using Facebook Login

Cognito Sync (deprecated, now AppSync)
Store preferences, configuration, state of app
Cross device (any platform - iOS, Android, etc)
Offline capability (synchronization when back online)
Requires Federated Identity Pool in Cognito (not User Pool)
Store data in datasets (up to 1MB)
Up to 20 datasets to synchronize
Serverless Solution Architecture
Rewatch Section
S3 Transfer acceleration, upload hits CloudFront which puts to S3


Microservices
You are free to design each micro-service the way you want
Synchronous patterns: API GW, LB
Asynchronous patterns: SQS, Kinesis, SNS, Lambda triggers (S3)
Challenges with microservices
Repeated overhead for creating each new microservice
Issues with optimizing server density/utilization
Complexity of running multiple versions of multiple microservices simultaneously
Proliferation of client-side code requirements to integrate with many seprate services
Some of the challenges are solved by Serverless patterns
API GW and Lambda scale automatically and you pay per usage
You can easily clone APIs to reproduce environments
Generated client SDK through Swagger integration for the API gateway


Database Comparison
Questions to choose the right database based on your architecture
Read heavy, write heavy, balanced workload? Throughput needs? Will it change, does it need to scale or fluctuate during the day?
How much data to store and for how long? Will it grow? Average object size?
Data durability (week, years)? Source of truth for the data?
Latency requirements? Concurrent users?
Data model? How will you query the data? Joins? Structured? Semi-structured?
Strong schema? More flexibility? Reporting? Search? RDBMS / NoSQL?
License costs? Switch to Cloud Native DB such as Aurora?
Database Types
RDBMS (= SQL/OLTP): RDS, Aurora - great for joins
NoSQL: DyamoDB (~JSON), ElasticCache (key/value pairs), Neptune (graphs) - no joins, no SQL
Object Store: S3 (for big objects), Glacier (backups /archives)
Data Warehouse (=SQL Analytics / BI): Redshift (OLAP), Athena
Search: ElasticSearch (JSON) - free text, unstructured searches
Graphs: Neptune - displays relationship between data
RDS Overview
Managed PostgreSQL / MySQL / Oracle / SQL server
Must provision an EC2 instance and EBS volume type and size
Suport for Read Replicas and Multi AZ
Security through IAM, Security Groups, KMS, SSL in transit
Backup / Snapshot / Point in time restore
Managed and Scheduled maintenance
Monitoring through CloudWatch
Use Case: Store relational datasets (RDBMS / OLTP), perform SQL queries, transactional inserts / update / delete available
RDS for Solutions Architect (WAF)
Operations: small downtime for failover and maintenance, scaling with read replicas and EC2 type, restore EBS implies manual intervention, application changes must be done for changes
Security: AWS responsible for OS security, we are responsible for setting up KMS, security groups, IAM policies, authorzing users in DB, using SSL
Reliability Multi AZ feature, failover in case of failures
Performance: depends on EC2 instance type, EBS volume type, ability to add Read Replicas. Doesn't auto-scale
Cost: Pay per hour based on provisioned EC2 and EBS
Aurora Overview
Compatible API for PostgreSQL and MySQL
Data is held in 6 replicas, across 3 AZ
Auto-healing capability
Multi-AZ, Auto-Scaling Read Replicas
Read Replicas can be Global
Aurora database can be Global for DR or latency purposes
Auto-scaling of storage from 10GB to 64TB
Define EC2 instance type for Aurora, but changeable
Same security / monitoring / maintenance features as RDS
"Aurora Serverless" option
Use case: Same as RDS but with less maintenance / more flexibility / more performance
Operations: less operations, auto-scaling storage
Security: AWS responsible for OS security,we are responsible for setting up KMS, security groups, IAM policies, authorizing users in DB, using SSL
Reliability: Multi AZ, HA, possibly more than RDS (6 data copies), Aurora Serverless option
Performance: 5x performance due to architectural optimizations, up to 15 read replicas (5 for RDS)
Cost: Pay per hour bsed on EC2 and storage usage. Possibly lower costs compared to things like Oracle
ElastiCache Overview
Managed Redis / Memacached (same offering as RDS but caches)
In-memory data store, sub-millisecond latency
Must provision and EC2 instance type
Support for Clustering (redis) and Multi AZ, Read Replicas (Sharding)
Security through IAM, Security Groups, KMS, Redis Auth
Backup, Snapshot, Point in time restore
Managed and scheudled maintenance
Monitoring through CloudWatch
Use case: Key/Value store, frequent reads, less writes, cache results for DB queries, store session data for websites, cannot use SQL (retrive by key not query)
Operations: Same as RDS
Security: AWS responsible for OS security, we for KMS, security groups, users (Redis Auth), using SSL
Reliability: Clustering, Multi AZ
Performance: Sub-millisecond performance, in memory, read replicas for sharding
Cost: Pay per hour based on EC2 and storage usage
DynamoDB Overview
AWS proproetary technology, managed NoSQL
Serverless, provisioned capacity, auto-scaling, on demand capacity (Nov 2018)
Can replace ElastiCache as a key/value store (storing session data for ex)
HA, Multi AZ by default, Read and Writes are decoupled, DAX for read cache
Reads can be eventually consistent or strongly consistent
Security, Authentication, and Authorization is done through IAM
DynamoDB Streams to integrate with Lambda (on any DB change)
Backup / Restore feature, Point in Time (35 days), GlobalTable feature (requires DDB Streams enabled)
Monitoring through CloudWatch
**Can only query on primary key, sort key, or indexes **
Use case: Serverless application development (small docs 100s KB), distributed serverless cache, doesn't have SQL query language available, has transactions capability from Nov 2018
Operations: No operations needed, auto-scaling capability, serverless
Security:Full security through IAM policies, KMS encryption, SSL in flight
Reliability: Multi AZ, Backups, Point in Time
Performance:Single digit millisecond performance, DAX For sub caching reads, performance doesn't degrade if app scales
Cost: Pay per provisioned capacity and storage usage, no need to gues (can use auto-scaling)
S3 Overview
S3 is a key / value store for objects
Great for big objects, not so great for small objects
Serverless, scales infinitely, max object size is 5TB
Eventually consistent for overwrites and deletes
Tires: S3 Standard, S3 IA, S3 One Zone IA, Glacier for backups
Features: Versioning, Encryption, Cross Region Replicated, etc...
Security: IAM, Bucket Policies, ACL
Encryption: SSE-S3, SS-KMS, SSE-C, client side encryption, SSL in transit
Use case: Static files, key value store for big files, website hosting
Operations: No operations
Security: IAM, Bucket Policies, ACL, Encryption, SSL
Reliability: %99.999999999 durability, %99.99 availability, Multi AZ, CRR
Performance: Scales to thousands of read / writes per second, transfer acceleration (CloudFront) / multi-part upload for big files
Cost: Pay per storage used, network cost, requests number
Athena
Fully serverless database with SQL capabilities
Used to query data in S3
Pay per query
Output results back to S3
Secured through IAM
Operations: No operations, serverless
Security: IAM + S3 security
Reliability: Managed service, uses Presto engine, HA
Performance: Queries scale based on data size
Cost: Pay per query / per TB of data scanned, serverless
Redshift
Redshift is based on PostgreSQL, but it's not used for OLTP
It's OLAP - online analytical processing (analytics and data warehousing)
10x better perf than other data warehouses, scale to PBs
Columnar storate of data (instead of row based)
Massively Parallel Query Execution (MPP), HA
Pay as you go based ont he instances provisioned
Has a SQL interface for performing the queries
BI tools such as Quicksight or Tableau integrate with it
Data is loaded from S2, DynamoDB, DMS, other DBs...
From 1 to 128 nodes, up to 160GB of space per node
Leader node: for query planning, results aggregation
Compute node: for performing the queries, send the results to leader
Redshift Spectrum: perform queries directly against S3, no need to load
Backup & restore, Security VPC / IAM / KMS / Monitoring
Redshift Enahnced VPC Routing: COPY & UNLOAD goes through VPC not internet
Operations: Similar to RDS
Security: IAM, VPC, KMS, SSL (similar to RDS)
Reliability: HA (cluster), auto-healing feautres
Performance: 10x perf, compression
Cost: Pay per node provisioned, 1/10th cost of others
Neptune
Fully managed graph database
For:
High relationship data
Social networking
Knowledge graphs (Wikipedia)
Highly available across 3 AZ, with up to 15 read replicas
Point in time recovery, continuous backup to Amazon S3
Support for KMS and HTTPS
Operations: Similar to RDS (must provision instance)
Security: IAM, VPC, KMS, SSL, IAM Authentication
Reliability: Multi AZ, clustering
Performance: Best suited for graphs, clustering to improve perf
Cost: Pay per node provisioned
ElasticSearch
Examnple: In DDB you can only find by primary key or index created on top
With ElasticSearch you can *search any field, even partials
It's common to use ElasticSearch as a complement to another DB (for website search as example)
ElasticSearch also has Big Data application usage
You can provision a cluster of instances
Built-in integrations for ingestion: Kinesis Firehose, IOT, Cloudwatch logs
Security through cognito & IAM, KMC, SSL, VPC
Comes with Kibana (visualization) & Logstash (log ingestion) = ELK Stack
Operations: Similar to RDS
Security: Cognito, IAM, VPC, KMS, SSL
Reliability: Multi AZ, clustering
Performance: Petabyte scale
Cost: Pay per node provisioned
= Search / indexing
AWS Monitoring
CloudWatch
CloudWatch provides metrics for every service in AWS
Metric is a variable to monitor (CPUUtilization, NetworkIn, etc)
Metrics belong to namespaces
Dimension is an attribute of a metric (instance id, environment, etc)
Up to 10 dimensions per metric
Metrics have timestamps
Can create a CloudWatch dashboard of metrics
Detailed Monitoring
EC2 instance metrics have metrics every 5 minutes
With detailed monitoring (for a cost) you get data every 1 minute
Use detailed monitoring for more effective ASG scaling
Free Tier allows up to 10 detailed monitoring metrics
EC2 Memory usage is not pushed by default, msut be pushed from inside the instance
CloudWatch Custom Metrics
Possibility to define and send your own custom metrics to CloudWatch
Ability to use dimensions (attributes) to segment metrics
Instance.id
Environment.name
Metric resolution:
Standard: 1 minute
High resolution: Down to 1 second (StorageResolution API parameter) - Higher Cost
Use API call PutMetricData
Use exponential back off in case of throttle errors
Available metrics
ASGAverageCPUUtilization—Average CPU utilization of the Auto Scaling group.
ASGAverageNetworkIn—Average number of bytes received on all network interfaces by the Auto Scaling group.
ASGAverageNetworkOut—Average number of bytes sent out on all network interfaces by the Auto Scaling group.
ALBRequestCountPerTarget—Number of requests completed per target in an Application Load Balancer target group.
CloudWatch DashBoards
Great way to set up dashboards for quick access to key metrics
Dashboards are global, go to each region to set up, but see anywhere
Dashboards can include graphs from different regions
You can change the time zone & time range of the dashboards
You can set up automatic refresh (10s, 1m, 2m, 5m, 15m)
Pricing:
3 Dashboards (up to 50 metrics) for free
$3/dashbaord/month afterwards
CloudWatch Logs
Applications can send logs to CloudWatch via the SDK
CloudWatch can collect logs from:
Elastic Beanstalk: Collects from application
ECS: Colelcts from containers
Lambda: Collects from functions
VPC Flow Logs
API Gateway
CloudTrail based on filter
CloudWatch Logs Agents: For example on EC2 machines
Route53: Logs DNS queries
CloudWatch logs can go to:
Batch exporter to S3 for archival
Stream to ElasticSearch cluster for further analytics
Log storage architecture:
Log Groups: Arbitary name, usually representing an application
Log Stream: instances within application / log files / containers (A log stream is a sequence of log events that share the same source)
Can define log expiration policies (never expire, 30 days, etc)
Using the CLI we can tail CloudWatch logs
To send logs to CloudWatch, make sure IAM permissions are correct!
Security: Encryption of logs using KMS at the Group level
CloudWatch Logs Metric Filter & Insights
CloudWatch Logs can use filter expressions
For example, find a specific IP inside a log
Metric filters can be used to trigger alarms (found specific IP, then alarm)
CloudWatch Logs Insights can be used to query logs, and add queries to CloudWatch Dashboards (comes withe some default)
CloudWatch Alarms
Alarms are used to trigger notifications for any metric
Alarms can go to Auto Scaling, EC2 Actions, SNS Notifications
Various options (sampling, %, max, min, etc)
Alarm States:
OK
INSUFFICIENT_DATA
ALARM
Period:
Length of time in seconds to evalute the metric
High resolution custom metrics: can only choose 10 sec or 30 sec
CloudWatch Events
Schedule: Like a cron job (same format)
Event Pattern: Event rules to react to a service doing something (Ex: CodePipeline state changes)
Triggers to Lambda functions, SQS/SNS/Kinesis Messages
CloudWatch Event creates a small JSON document to give info on the change
CloudTrail
Provides governance, compliance, and audit for your account
Enabled by default
Get a history of events / API calls made within your account by:
Console
SDK
CLI
AWS Services
Can put logs from CloudTrail into CloudWatch logs
If a resource is delted, look into CloudTrail first
Security
Encryption in Flight
Ensures no MITM
Encryption at Rest
Data is encrypted after being received by server
Data is decrypted before being sent
The encryption / decryption keys (data key) must be managed somewhere and the server must have access to it
Client Side encryption
Data is encrypted by client, never decrypted by server
Data will be decrypted by a receiving client
The server should not be able to decrypt the data
Could leverage Envelope Encryption
KMS (Key Management Service)
Fully integrated with IAM for authorization
Seamlessly integrated into most AWS services (EBS, S3, Redshift, SSM, etc)
But you can also use the CLI / SDK
Any time you need to share sensitive information, use KMS
DB PW
Credentials to external sercive
Private Key of SSL certs
The Customer Master Key (CMK) used to encrypt data can never be retrieved from KMS by the user, and it can be rotated for extra security
Never store secrets in plaintext, especially in code
Encrypted secret can be stored in code / environment variables
KMS can only help in encrypting up to 4KB of data per call: PW, SSL cert, credentials, etc
If data > 4KB use envelope encryption
To grant KMS access to someone:
Make sure the Key Policy allows the user
Make sure the IAM Policy allows the API calls
KMS makes you able to fully manage the keys & policies: (although we cannot ever see the keys ourselves)
Create
Rotation policies
Disable
Enable
Able to audit key usage (using CloudTrail)
Three types of CMK
AWS Managed Service Default CMK: free
User Keys created in KMS: $1 / month
User Keys imprtoed (must be 256-bit symmetric key): $1 / month
pay for API calls to KMS: $0.03 / 10000 calls

Encryption in AWS Services
Requires migration (through Snapshot / Backup)
EBS Volumes
RDS databases
ElastiCache
EFS network file system
In-place encryption
S3
AWS Paramter Store
Secure storage for configuration and secrets
Optimal Seamless Encryption using KMS
Serverless, scalable, durable, easy SDK, free
Version tracking of confgiurations / secrets
Configuration management using path and IAM
Notifications with CloudWatch Events
Integration with CloudFormation
Simplifies workflow vs KMS

Parameter Store Hierarchy
/my-department/
my-app/
dev/
db-url
db-password
prod/
db-url
db-password
other-app/
/other-dept/
Can have encrypted or plaintext parameters
In System Manager - Applciation Mgmt, or CLI
GetParameters API via Lambda/SDK function or
GetParametersByPath API
STS - Security Token Service
Allows granting limited and temporary access to AWS resources
Token is valid for up to 1 hour (must be refreshed)
Cross Account Access
Allows users from one AWS account access to resources in another
Federation (Active Directory)
Provides a non-AWS user with temporary AWS access by linking user's AD credentials
Uses SAML
Allows Single Sign On (SSO) which enables users to log in to AWS console without assigning IAM credentials
Federation with third party providers / Cognito
Used mainly in web and mobile apps
Makes use of FB/G/Amazon etc to federate them
Cross Account Access
Define an IAM Role for another account to access
Define which accounts can access this IAM Role
Use AWS STS to retrieve credentials and impersonate the IAM Role you have access to (AssumeRole API)
Temproary credentials can be valid between 15 minutes to 1 hour

Identity Federation with AD and Cognito
Federation lets users outside of AWS assume a temporary role for accessing AWS resources
These users assume an identity provided access role
Federation assumes a form of 3rd party authentication
LDAP
MS AD (~=SAML)
Single Sign On
OpenID
Cognito
Using federation you don't need to create IAM users (user mgmt is outside AWS)

SAML Federation (for Enterprise)
To integrate AD / ADFS with AWS (or any SAML 2.0)
Provides access to AWS Console or CLI (through temporary credentials)
No need to create an IAM user for each employee

Custom Identity Broker App (for Enterprise) (no SAML 2.0)
Use only if the identity provider is not compatible with SAML 2.0
You must code your own identity broker which must determine the appropriate IAM policy

Cognito - Federated Identity Pools (For Public Applications)
Goal:
Provide direct access to AWS Resources from the client side
How:
Log in to federated identity provider (or remain anonymous) (CUP, FB, G, OpenID, SAML, etc)
Get temproary AWS credentials back from the Federated Identity Pool (Cognito)
They come with a pre-defined IAM policy stating permissions
Example:
Provide (temprorary) access to write to S3 bucket using FB login
Note: Web Identity Federation is an alternative to using Cofnito but AWS recommends against

Shared Responsbility Model

VPC
CIDR
Two components
Base IP (xx.xx.xx.xx)
Subnet mask (/32) (defines how many bits can change in an IP)
Can take two forms
/24
255.255.255.0 (less common)
/32 = 1 IP = 2^0
/31 = 2 IP = 2^1
/30 = 4 IP = 2^2
/29 = 8 IP = 2^3
/24 = 256 IP = 2^8
etc
/16 = 65536 = 2^16
/0 = all = 2^32
/32 - No IP number can change
/24 - Last .xIP number can change
/16 - Last x.xIP number can change
/8 - Last x.x.xIP number can change
/0 - All x.x.x.xIP numbers can change
Public vs Private
IANA via RFC 1918
Private IP can have the following values
10.0.0.0 - 10.255.255.255 (10.0.0.0/8)
172.16.0.0 - 172.31.255.255 (172.16.0.0/12) AWS default
192.168.0.0 - 192.168.255.255 (192.168.0.0/16)
VPC in AWS - IPv4
Can have multiple VPCs per region (5 soft limit)
Max CIDR per VPC is, following:
Min size /28 = 16 IP
Max size /16 = 65535 IP
Because VPC is private, only RFC1918 addresses
VPC CIDR should not overlap with your other networks
Subnets
AWS reserves 5 IPs (first 4 and last 1 of range) in each Subnet
They are not available for use
For CIDR 10.0.0.0/24:
10.0.0.0: Network address
10.0.0.1: Reserved by AWS for the VPC router
10.0.0.2: Reserved by AWS for mapping to Amazon provided DNS
10.0.0.3: Reserved for future use
10.0.0.255: Network broadcast (assume not available for exam)
Exam Tip: If you need 29 IP addresses for EC2 you can't choose a /27 because it's only 32 IPs, need a /26 (64IP)
Internet Gateway
Helps VPC internet connection
Scales horizontally, HA, and redundant
Must be created separately from VPC
One VPC per IGW, one IGW per VPC
IGW is also a NAT for the instances that have a public IPv4
Will not have internet access without Route Tables
NAT Instances (outdated)
Allow instances in the private subnet to connect to the internet
Must be launched in a public subnet
Must disable EC2: Source / Destination Check
Must have an Elastic IP (because route tables require fixed)
Route table must be configured to route trafcic from private subnets to NAT instance
Pre-configured Amazon Linux AMI are available
Not highly available or resilient setup by default
Would need to create an ASG in Multi AZ + resilient user-data script
Internet traffic bandwidth depends on EC2 instance performance
Must manage security ggroups & rules
Inbound
Allow HTTP/S from private subnets
Allow SSH from hom network (through IGW)
Outbound
Allow HTTP/S traffic to internet
Allow ICMP traffic to internet
NAT Gateway (new)
Only IPv4
AWS managed NAT, higher bandwidth, better availability, no admin
Pay by the hour for usage and bandwidth
NAT is created in a specfic AZ, uses EIP (can be in used Public Subnet)
Cannot be used by an instance in that subnet (only from other subnets)
Requires and IGW (Private subnet -> NAT -> IGW)
5 Gbps of bandwidth with auro-scaling up to 45Gbps
No security groups required
DNS Resolution in VPC
enableDnsSupport: (=Edit DNS Resolution Setting)
Default True
Decides if DNS resolution is supported for the VPC
IfTrue, queries the AWS DNS server at 169.254.169.253
enableDnsHostname: (=Edit DNS Hostname setting)
False by default for newly created VPC, True by default for Default VPC
Won't do anything unless enableDnsSupport=True
IfTrue, assign a public hostname to EC2 instances if it has a public IP
If you must use custom DNS domain names in a private zone in Route 53, you must have both as TRUE
NACL are like a firewall controlling traffic to and from subnet
Default NACL allows everything inbound and outbound
One NACL per Subnet, new Subnets are assigned the Default NACL
Define NACL rules:
Rules have a number (1 - 32776) and LOWER number have precedence (once a number is matched it wins and ignores after)
Last rule is an asterisk (*), and denies all in case of no match
AWS recommends adding rules by increment of 100
Newly created NACL will deny everything
NACL are great way of blocking a specfic IP at the subnet level
Can be associated to multiple subnets
Rmemeber ephemeral ports
Inbound

SG is Stateful on outbound, will allow out an incoming request return even if outbound rules say not to (SG evaluates all rules before deciding)
NACL is Stateless on outbound, all rules are evaluated
Outbound

SG is Stateful on inbound, will allow in a returning request even if inbound rules say not to
NACL is Stateless on inbound, all rules are evaluated

VPC Endpoints
Endpoints allow you to connect to AWS services using a private network instead of the public internet
They scale horizontally and are redundant
They remove the need for IGQ, NAT, etc, to access AWS services
Interface: provisions and ENI (private IP) as an entry point (select subnets)(must attach security group) - for most AWS services
Gateway: provisions a target and must be used in a route table which is associated with subnets S3 and DynamoDB
Needs region specified on the CLI because CLI default is us-east-1 with unspecified
In case of issues:
Check DNS setting resolution in your VPC
Check Route Tables
VPC Peering
Connect two VPC privately using AWS' network
Make them behave as if they were in the same network
Must not have overlapping CIDR
VPC Peering connection is not transitive (must be established for each VPC that needs to communicate with another)
Can do between accounts and regions
You must update route tables in each VPC's subnets to ensure instances can communicate
Flow Logs
Capture information about IP traffic going to your interfaces:
VPC Flow Logs
Subnet Flow Logs
Elastic Network Interface (ENI) Flow Logs
For ACCEPT and REJECT traffic
Helps to monitor & troubleshoot connectivity issues
Flow logs data can go into S3 (Athena) / CloudWatch Logs (Insights)
Captures network information from AWS managed interfaces too: ELB, RDS, ElastiCache, Redshift, WorkSpaces
Flow Log Syntax
[version, accountid, interfaceid, srcaddr, dstaddr, srcport, dstport, protocol, packets, bytes, start, end, action, logstatus]
2 123456789010 eni-1235b8ca123456789 172.31.16.139 172.31.16.21 20641 22 6 20 4249 1418530010 1418530070 ACCEPT OK
Query VPC flow logs using Athena on S3 or CloudWatch Logs Insights
Bastion Hosts
Used to SSH into private instances
In the public subnet which is then connected to all private subnets
Bastion Host security must be tight
Exam tip: Make sure the bastion host only has port 22 from your ip, not even SG of your other instances
Site to Site VPN, Virtual Private Gateway, Customer Gateway
Virtual Private Gateway
VPN concentrator on the AWS side of the VPN connection
VGW is created and attached to the VPC from which you want to create the site-to-site VPN
Possbility to customize the ASN
Customer Gateway
Software application or physical device on customer side of the VPN connection
IP Address
Use the static, internet routeable, IP address of your customer gateway device
If the CGW is behind a NAT (with NAT-T), use the public address of the NAT
Direct Connect
Provides a dedicated private connection from a remote network to your VPC
Dedicated connection must be setup between your DC and AWS Direct Connect locations
You need to set up a Virtual Private Gateway on your VPC
Access public resources (S#) and private (EC2) on the same connection
Use cases:
Increase bandwidth throughput - working with large data sets - lower cost
More consistent network experience - application using real-time data feeds
Hybrid Environments
Supports both IPv4 and IPv6

Direct Connect Gateway
If you want to set up a Direct Connect to one or more VPC in many different regions (no overlapping IPs)

Egress only IGW
Egress only IGW is for IPv6 only
Similar function as a NAT (GW), but a NAT is for IPv4
All IPv6 are public addresses
Therefore all instances are publicly accessible
Egress Only Internet Gatway gives out IPv6 instances access to the internet, but not reachable publicly
After creating an Egress Only IGW edit the Route Tables
VPC Summary

Other Services
CI/CD
Code - CodeCommit, Build - CodeBuild, Test - CodeBuild, Deploy - Elastic Beanstalk or CodeDeploy -> EC2 Fleet, Provision
CodePipeline ORchestrates it all
When deploying code directly onto EC2 instances or On Premise servers, CodeDeploy is the service to use. You can define the strategy (how fast the rollout of the new code should be)
Infrastructure as Code
CloudFormation - Declarative way of outlining Infrastructure (does ordering and orchestration for you)
Manual way: Edit templates in designer, use console to input parameters
Automated way: Edit YAML file, use CLI to deploy (recommended)
Template Components
Resources: Resources declared in template (mandatory)
Parameters: The dynamid inputs for your template
Mappings: Static variables for template
Outputs: References to what has been created
Conditionals: List of conditions to perform resource creation
Metadata
Template Helpers
References
Functions
ECS
Container orchestration service
Made of:
Core, running ECS on user-provisioned EC2 instances
Fargate: serverless
EKS: K8s on managed EC2
ECR: Registry
ECS
ECS Cluster: set of EC2 instances
ECS Service: Application definitions running on Cluster
ECS Tasks + definition: The containers running to create the the application
ECS IAM roles: Roles assigned to tasks to interact with AWS
ALB integration has direct integration with ECS called port mapping
Run multiple instances of the same application on the same machine
Increased resiliency even if running on one EC2 instance
Maximize CPU/Core utilization
Ability ot perform rolling upgrades without impacting application
ECS Setup and config file
Run an EC2 instance, install the ECS agent with ECS config file
Or use ECS ready Linux AMI (and smodify the config file)
Config file is at: /etc/ecs/ecs.config

ECR Registry
Store, manage, deploy your containers
Fully integrated with IAM & ECS
Sent over HTTPS, and encrypted at rest
Step Functions
Build Serverless visual workflow to orchestrate your Lambda functions
Represent flow as a JSON state machine, outputs a visual workflow graph, can see steps succeed / in progress / fail etc
Features: sequence, parallel, conditions, timeouts, error handling...
Maximum execution time of 1 year
Can implement human approval feature
Use cases: Order fulfillment, data processing, etc
SWF - Simple Workflow Service (older)
Coordinate work amongst applications (not serverless)
**Step Functions is recommended for all new apps, except:
If you need external signals to intervene in the process
If you need child processes that return values to parent process.**
AWS Glue
Fully managed ETL service
Move from data sources, transform, clean, change format and put somewhere
Automate time consuming steps of data preparation for analytics
Provisions Apache Spark
Crawls data sources and identifies data formats (schema inference)
Automated Code Generation to customize Spark code
Sources: Aurora, RDS, Redshift, & S3 (crawls tables etc and discovers all)
Sinks: S3, Redshift, etc
Glue Data Catalog: Metadata (definition & schema) of the Source Tables (to later use in your EMR)
Opsworks
Opsworks = Managed Chef & Puppet
Alternative to AWS SSM
Configuration as code
Elastic Transcoder
Convert media files (video & music) stored in S3 to various formats
Features: bit rate optmization, thumbnail, watermarks, captions, DRM, rpgoressive download, encryption
Components:
Jobs: what does the actual work
Pipeline: Queue that manages the transcoding job
Presets: Template for converting media from one format to another
NOtifications: SNS for example
Pay for waht you use, fully managed
AWS Organizations
Global service
One master account - can't change it
Other accounts are member accounts, which can only be part of one org
Consolidated billing across all accounts
Pricing benefits from aggregated usage
API is available to automate account creation
Organize accounts in Organizational Units (OU)
Can be anything dev, test, prod, or hr, finance, IT
Can nest OU within OU
Apply Service Control Policies (SCPs) to OU
Permit / Deny access to AWS services
SCP has a similar syntax to IAM
It's a filter to IAM
Helpful for sandbox account creation
Helpful to separate dev and prod resources
Helpful to only allow approved services

AWS WorkSpaces
On demand Managed, Secure Cloud Desktop
Elimite on-prem VDI
Secure, encrypted, network isolation
Integrates with AD
Windows and Linux
AppSync
Store and sync data across mobile and web-apps in real-time
MAkes use of GraphQL (from facebook)
Integrates with DynamoDB / Lambda
Offline data synchronization (alternative to Cognito, exam)
AWS Single Sign On
Centrally managed SSO across multiple AWS account, Business Applciations (O365, Salesforce, Box, etc)
One login gets you access to everything securely
Integrated with MS AD
Reduces process of setting up SSO in a company
Only helpful for Web Browser, SAML 2.0 enabled applications
Here's a quick cheat-sheet to remember all these services:
CodeCommit: service where you can store your code. Similar service is GitHub
CodeBuild: build and testing service in your CICD pipelines
CodeDeploy: deploy the packaged code onto EC2 and AWS Lambda
CodePipeline: orchestrate the actions of your CICD pipelines (build stages, manual approvals, many deploys, etc)
CloudFormation: Infrastructure as Code for AWS. Declarative way to manage, create and update resources.
ECS (Elastic Container Service): Docker container management system on AWS. Helps with creating micro-services.
ECR (Elastic Container Registry): Docker images repository on AWS. Docker Images can be pushed and pulled from there
Step Functions: Orchestrate / Coordinate Lambda functions and ECS containers into a workflow
SWF (Simple Workflow Service): Old way of orchestrating a big workflow.
EMR (Elastic Map Reduce): Big Data / Hadoop / Spark clusters on AWS, deployed on EC2 for you
Glue: ETL (Extract Transform Load) service on AWS
OpsWorks: managed Chef & Puppet on AWS
ElasticTranscoder: managed media (video, music) converter service into various optimized formats
Organizations: hierarchy and centralized management of multiple AWS accounts
Workspaces: Virtual Desktop on Demand in the Cloud. Replaces traditional on-premise VDI infrastructure
AppSync: GraphQL as a service on AWS
SSO (Single Sign On): One login managed by AWS to log in to various business SAML 2.0-compatible applications (office 365 etc)
Whitepapers
Well Architected Framework + Tool
General Guiding Principles
Stop guessing capacity needs
Test systems at production scale
Automate to make architectural experimentation easier
Allow for evolutionary architectures
Design based on changing requirements
Drive architecture changes using data
Improve through game days
Simulate applications for flash sale days
5 Pillars
Operational Excellence
The ability to run and monitor systems to deliver business value and to continually improve supporting processes and procedures
Design Principles
Perform annotations as code - Infrastructure as code
Annnotate Documentation - Automate the creation of annotated documentation after every build
Make frequent, small, reversible changes
Refine operations procedures frequently - And ensure that team members are familiar with it
Aniticpate failure
Learn from all operation failures
Prepare
CloudFormation, AWS Config
Operate
CloudFormation, AWS Config, CloudTrail, CloudWatch, X-Ray
Evolve
CloudFormation, CodeBuild, CodeCommit, CodeDeploy, CodePipeline
Security
Includes the ability to protect information, systems, and assets while delivering business value through risk assessments and mitigation strategies
Design Principles
Implement a strong identity foundation - Centralize privilege management and reduce (or even eliminate) reliance on long term credentials - Principle of Least Privilege - IAM
Enable traceability - Integrate logs and metrics with systems to automtaically respond and take action
Apply security at all layers - Edge Network, VPC, Subnet, Load balancer, each instance, OS, and application
Automate Security best practices
Protect data in tansit and at rest - Encryption, tokenization, and access control
Keep people away from data - No direct or manual access
*Prepare for security events - Run incident response, simulations and use tools with automation to increase your speed for detection, investigation, and recovery
IAM
IAM, AWS-STS, MFA token, Organizations
Detective Controls
Config, CloudTrail, CloudWatch
Infrastructure Protection
CloudFront, VPC, Shield, WAF, Inspector
Data Protection
KMS, S3, ELB, EBS, RDS
Incident Response
IAM, CloudFormation, CloudWatch Events
Reliability
Ability of a system to recover from infrastructure or service disruptions, dynamically acquire computing resources to meet demand, and mitigate disruptions suchj as misconfigurations or transient network issues
Design Principles
Test recovery scenarios - Use automation to simulate different failures or to recreate scenarios that led to failures before
Automatically recover from failure - Anticipate and remediate failure before they occur
Scale horizontally to increase aggregate system availability - Distribute requests across multiple, smaller resources to ensure that they don't share a common point of failure
Stop guessing capacity - Maintain the optimal level to satisfy demand without over or under provisioning
Manage change via automation
Foundations
IAM, VPC, Service Limits, Trusted Advisor
Change management
Autoscaling, CloudWatch, CloudTrail, Config
Failure Management
Backups, CloudFormation, S3, S3 Glacier, Route 53
Performance Efficiency
Includes the ability to use computing resources efficiently to meet system requirements and to maintain that efficiency as demand change and techonologies evolve
Design Principles
Democratize advanced technologies - Advanced technologies become services and hence you can focus more on product development
Go global in minutes - Easy deployment in multiple regions
Use serverless archtiectures - Avoid burden of managing servers
Experiment more often - Easy to carry out comparative testing
Mechanical sympathy - Be aware of all AWS services
Selection
Auto-Scaling, Lambda, EBS, S3, RDS
Review
CloudFormation
Monitoring
CloudWatch, Lambda
Tradeoffs
RDS, Elasticache, Snowball, Cloudfront (all have tradeoffs vs other solutions)
Cost Optimization
includes the ability to run systems to deliver business alue at lowest price point
Design Principles
Adopt a consumption model - Pay only for what you uuse
Measure overall efficiency - Use CloudWatch
Stop spending money on data center operations - AWS does the infrastrcuture part and enables customer to focus on organization projects
Analyze and attribute expenditure - Accurate identification of system usage and costs, helps measure return on investment. USE TAGS
Use managed and application level services to reduce cost of ownership - As a managed services operate at cloud scal, they can offer a lower cost per transacation or service
Expenditure Awareness
Budgets, Cost and Usage reports, Cost Explorer, Reserved Instance Reporting
Cost-effective resources
Spot instance, Reserved instances, Glacier
Matching supply and demand
Auto-Scaling, Lambda
Optimizing Over Time
Trusted Advisor, Cost and usage reports
Not tradeoffs, they're a synergy
Well Architected Tool
Define workload, track over time
Milestones, improvement plans, Risks
Trusted Advisor
Cost optimization, Performance, Security, Fault Tolerance, Service Limits
Get upgraded recommendations, more than for governance
Some paid
Can get weekly emails to different contact groups
Disaster Recovery
Any event that has a negative impact on a company's business continuity or finances is a disaster
DR is about preparing for and recovering from a disaster
What kind of DR?
On-Premisea -> On-preimse (traditional, $$$$)
On-Premises -> AWS Cloud (hybrid recovery)
AWS Cloud Region A -> AWS Cloud Region B
Strategies
Backup and restore (Longest RTO, high RPO, not too expensive)
Pilot Light (2nd longest RTO, Small version of the app is always running in the cloud, similar to backup restore but critical core up )
Warm Standby (3rd longest RTO, full system up and running but at minimum size, scale to production load)
Multi-Site (Shortest RTO, full prod at second site)
But all get increasingly more expensive
Last updated
Was this helpful?