Solutions Architect Associate - Study Notes

Exam Tips

Associate everything learned to a WAF pillar
If a solution seems feasibale but highly complicated it's probably wrong
Don't overthink it
50% off next exam if pass

General Architecture

Regions and AZs

Region us-east-1
AZ us-east-1a-f
Consoles are region scoped (aside IAM, S3, and Route 53)
Global Infrastructure to see AZ # and definitions

Service Models

IaaS, PaaS, SaaS, FaaS (Function as a Service)

The service stack responsibility differs depending on the service model (Data centre, Network and Storage, Host/Servers, Virtualization, OS, Runtime, Application, Data)

High Availability (HA) vs. Fault Tolerance

HA - Hardware, Software, and configuration allowing a system to recover quickly in the event of a failure (minimize downtime, not to prevent the failure to begin with)
Fault Tolerance - System designed to operate through a failure with no user impact.

RPO vs. RTO

RPO - How much a business can tolerate to lose, expressed in time between failure and backup.
RTO - Maximum time a system can be down, time to recover.

Scaling

Vertical Scaling - Increase size of server, maximum machine sizes will constrain (technically or cost wise)
Horizontal Scaling - Additional machines into a pool of resources, requires application support.

Tired Application Design

Presentation - interacts with customer
Logic - delivers application functionality
Data - data storage and retrieval
Monolithic application require vertical scaling

Misc.

Cost efficient or cost effective - Implementing for as little initial and ongoing cost
Application Session State - represents what a customer is doing, have chosen, or configured.
Undifferentiated Heavy Lifting - A part of an application, system, or platform that is not specific to your business.

Accounts

Budgets and Cost

Solution Architecture

Instantiating instances quickly

Golden AMI: Apps, dependencies, etc. done beforehand
User Data: For dynamic configuration (retrieving un/pw or something)
Hybrid: mix Golden and User Data (Elastic Beanstalk)
RDS: Restore from snapshot, DB will have schemas and data ready
EBS Volumes: restore from snapshot, will already be formatted and have data

Elastic Beanstalk

Single Instance deployment: Good for dev
LB + ASG: good for prod, pre-prod
ASG only: Good for non-web apps in productions (workers etc.)
Three components
- Application
- Application version
- Environment name
Can promote versions to next env
Rollback feature to previous version
Full control over lifecycle of envs
Support for most platforms (can write own custom platform too)

Well-Architected Framework (WAF)

Read WAF whitepaper
Re-read WAF notes from internal training
When going through course align everything with a WAF pillar
Pillars, Design Principles, Questions

Security

IAM

Global across all Regions
Account Aliases must be globally unique

Authentication and Authorization

Principal - Person or application that can make an authenticated or anonymous request to perform an action on a system
Authentication - Process of authenticating a principal against an identity
Identity - Objects that require authentication and are authorized to access resources
Authorization - Process of checking and allowing or denying access to a resource for an identity

Users

One user per physical person
chmod 0400 on .pem key file
- (Windows 10 SSH) Properties - > Security - > (make self owner) - > remove Inheritance - > remove all other users - > ensure Full Control

Groups

Roles

Internal use, machine use only?
One role per application, no sharing

Policies

Written in [JSON]

Compute

EC2

Exam Tips
- Billed by the second
- Windows 10 can use SSH
- SG can have IPs as rules, but also reference other SG for rules

Instance
- Has public IP by default, likely change on restart

User Data
- Commands automatically run with sudo
- Runs as root
- Runs first time system is run only
- Gets base64 encoded and passed

AMI
- Region specific (but can copy)
- Cross account AMI copy
  - You can share an AMI with another AWS account
  - Sharing an AMI does not affect ownership of the AMI
  - If you copy an AMI that has been shared with your account, you are the owner of the target AMI in your account
  - To copy an AMI that was shared from another account the source owner must grant you read permissions for the storage that backs the AMI (EBS snapshot or S3 bucket for instance store backed)
  - Limits:
    Can't copy encrypted shared AMI. If the underlying snapshot and encryption key were shared you can copy while re-encrypting it with own key. You own the copied snapshot and register it as new AMI.
    Can't copy a shared AMI with an associated billingProduct code, including Windows and Marketplace AMIs. To copy launch an EC2 instance using the shared AMI then create an AMI from the instance.
- Reside in S3 (cost based on storage used)
- Use custom AMI for faster deploy in ASG

EC2 Instance Launch Types
- On Demand Instances
  - For: Short-term uninterruptable workloads when cannot predict application behaviour
  - Pay per use, billing per second after first minute
  - Highest cost, no upfront payment or commitment
- Reserved Instances
  - For: Steady state usage (think database)
  - Up to 75% discount vs OD
  - Pay upfront for use, long term commitment, 1 or 3 years
  - Reserve specific instance type
  - Convertible Reserved Instance
    Can change EC2 instance type
    Up to 54% discount
  - Scheduled Reserved Instance
    Launch within the time window you reserve (at regular interval)
- Spot Instances
  - For: Batch jobs, Big Data analysis, failure resilient workloads
  - Discount up to 90% vs OD
  - Active as long as under bid price
  - Price varies on supply and demand
  - Reclaimed with 2 min warning when spot price goes above bid
- Dedicated Instances
  - Hardware dedicated to you
  - May share hardware with other instances in same account that are not Dedicated Instances
  - No control over instance placement
- ![Screen Shot 2019-10-27 at 15.03.37.png](../../../../_resources/Screen Shot 2019-10-27 at 15.03.37.png)

Instance Types
- R: RAM - ex: in-memory cache
- C: CPU - ex: compute/database
- M: Balanced (Medium)- ex: general/web app
- I: I/O (instance storage) - ex: databases
- G: GPU - ex: video rendering or machine learning
- Burstable (T2/T3)
  - Ok CPU, can burst to *good CPU
  - Burst uses burst credits
  - If all credits used, CPU becomes bad
  - When not bursting accumulates burst credits
  - See CloudWatch for burst credit usage
  - CPU burst credit chart
- Can pay for unlimited burstable mode

Placement Groups
- Cluster - Low latency, single AZ
  - Same rack, same AZ, 10GB Network, same failure zone
- Spread - Spreads across underlying hardware, and across AZs (max 7 instances per group, per AZ)(critical applications, maximum HA)
- Partition - Spreads across many partitions (which rely on different racks)within an AZ. Scales to 100's of instance per group (ex: Hadoop, Cassandra, Kafka)
  - Partition is a set of racks, can create up to 7 partitions in PG
  - Each partition has many instances, partition is same failure zone
  - Partition failure will not affect other
  - EC2 instances can get access to partition metadata

EC2 Instance Metadata

Ability to learn about one's self without using an IAM role
URL is http://169.254.169.254/latest/meta-data
Can retrieve IAM Role name from metadata, but not the IAM Policy
When querying curl http://169.254.169.254/latest/iam/security-credentials/myfirstrole
- Get AcessKeyID, Secret, and Token, which is what the EC2 instance gets via the IAM Role to access whatever
- Short lived

Storage

S3

Bucket names must be globally unique
- Global at top menu, (but regional service)
Minimum of 3 and maximum of 63 characters - no uppercase or underscores
Must start with a lowercase letter or number and can’t be formatted as an IP address (1.1.1.1)
Default of 100 buckets per account, and hard 1000 bucket limit via support request
Unlimited objects in buckets
Unlimited total capacity for a bucket
An object’s key is its name (FULL PATH including slashes and filename, but not bucket name)
An object’s value is its data (content)
An object’s size is from 0kb to 5TB (more than 5GB must use multi-part upload)
- To upload a file larger than 160 GB, use the AWS CLI, AWS SDK, or Amazon S3 REST API.
Metadata (list of key/value pairs, system or user metadata)
Tags (Unicode key/value pair -max 10-), useful for security / lifecycle
Version ID (if versioning is enabled)

Versioning

Bucket level setting
If you overwrite a key/file you increment its version
Best practice to version your buckets
- Protect against unintended deletes
- Easy roll back to previous version
Any file that is not versioned prior to enabling versioning will have a version NULL
Deleting a file only adds a delete marker

S3 Websites

URL can be
- .s3-website-.amazonaws.com
- .s3-website..amazonaws.com

S3 CORS

If you request data from another S3 bucket you need to enable CORS
Cross Origin Resource Sharing allows you to limit the number of websites that can request files in your S3 (help limit costs)
Access-Control-Allow-Origin:

S3 Consistency Model

Amazon S3 provides read-after-write consistency for PUTS of new objects in your S3 bucket in all regions with one caveat. The caveat is that if you make a HEAD or GET request to the key name (to find if the object exists) before creating the object, Amazon S3 provides eventual consistency for read-after-write.

S3 Security

User based
- IAM Policies - which API calls should be allowed for a specific user from IAM
Resource Based
- Bucket Policies - bucket wide rules from the S3 console - allows cross account
- Object ACLs - finer grain, not super popular
- Bucket ACLs - less common

S3 Bucket Policies

Grant public access to the bucket
Force objects to be encrypted at upload
Grant access to another account (Cross account)
JSON based (4 components)
- Resources: buckets and objects
- Actions: Set of APIs to Allow or Deny
- Effect: Allow or Deny
- Principal: The account of user to apply the policy to
Networking: Supports VPC endpoints (for instances in VPC with no internet)
Logging and Auditing: S3 access logs can be stored in another bucket, API calls can be logged in CloudTrail
User Security: MFA can be required in versioned buckets to delete objects, Signed URLs = valid for a limited time (ex: premium video service for time)

S3 Encryption for Objects

Can also set default encryption for bucket

SSE-S3

Keys handled and managed by AWS S3
Object is encrypted server side, sent via HTTP/S
AES-256
Must set header: "x-amz-server-side-encryption":"AES256"
S3 Managed Data Key + Object > Encrypted

SSE-KMS

Keys handled and managed by KMS
Object is encrypted server side, sent via HTTP/S
KMS advantages: user control (rotation etc.) + audit trail
Must set header: "x-amz-server-side-encryption":"aws:kms"
KMS Customer Master Key (CMS) + Object > Encrypted

SSE-C

Server Side encryption using keys fully managed by customer outside AWS
S3 does not store the key
HTTPS must be used
Encryption key is provided (sent) in HTTP header, in every request
Client provided data key + Object > Encrypted, S3 throws away key

Client Side Encryption

Client library such as Amazon S3 Encryption Client
Clients must encrypt data themselves before sending to S3
Client must decrypt data themselves when retrieving from S3
Customer fully manages the keys and encryption cycle

Encryption in Transit

AWS S3 exposes both HTTP and HTTPS endpoints, HTTPS recommended

Default Encryption vs Bucket Policies

Old way was to use bucket policies to enable and to refuse any HTTP command without proper headers
New way is to click "default encryption" option in S3
Bucket Policies are evaluated before default encryption
Either SSE-S3 (AES-256) or SSE-KMS

S3 MFA Delete

To use MFA-Delete must enable Versioning on the S3 bucket
You need MFA to
- permanently delete an object version
- suspend versioning on the bucket
You won't need it for
- enabling versioning
- listing deleted versions
Only bucket owner (root account) can enable/disable MFA-delete
Can only be enabled using the CLI

S3 Access Logs

Any request made to S3 from any account, authorized or denied, will be logged to another S3 bucket
Can analyze using data analysis tools (Hive, Athena, etc.)
Log format in docs

S3 Cross Region Replication

Must enable versioning (source and destination)
Must be in different regions (duh)
Can be different accounts
Copying is asynchronous
Must give proper IAM permissions to S3, needs Role
For:
- Compliance, lower latency access, cross account replication
Can do based on whole bucket, prefix, tags
Can replicate encrypted if other account has access to KMS key
Can change storage class or ownership

S3 Pre-signed URLs

Can create a pre-signed URL via CLI or SDK
- For downloads CLI
- For uploads SDK
Valid by default for 3600 seconds, change with --expires-in [TIME_BY_SECONDS]
Users who receive pre-signed URL inherit permissions of the generator for GET/PUT
aws configure set default.s3.signature_version s3v4
aws s3 presign s3://bucketname/file.jpg --expires-in 300 --region ca-central-1
Avoids direct access to the bucket from users

S3 Storage Tiers

S3 Standard - General Purpose
99.999999999% Durability (10 mil objects 10k years, lose 1)
99.99% availability
Can sustain 2 concurrent AZ loss
S3 Reduced Redundancy Storage (RRS)
- Deprecated
- 99.99% durability and availability
- Can sustain loss of single AZ
- Use for non-critical reproducible data
S3 Standard Infrequent Access (IA)
- Suitable for data less frequently access but requires rapid retrieval
- Retrieval fee
- 99.999999999% Durability (10 mil objects 10k years, lose 1)
- 99.99% availability
- Can sustain 2 concurrent AZ loss
- For backups, DR, etc.
S3 One Zone Infrequent Access
- Same as IA, but data is stored in a single AZ
- Retrieval fee
- 99.999999999% Durability; data is lost when AZ is destroyed
- 99.95% availability
- Lower cost by 20% than IA
- For secondary backup data, or recreatable
S3 Intelligent Tiering
- Small monthly auto-tiering fee
- Move between S3 and IA based on access patterns
- 99.999999999% Durability, 99.9% availability
- Can sustain single AZ loss
S3 Glacier
- Alternative to Tape (10's of years)
- 99.999999999% Durability
- Cost per estorage per month ($0.004 / GB) + Retrieval fee
- Each item is called an "Archive", up to 40TB size
- ARchives are stored in "Vaults", similar to bucket
- Retrieval options:
  - Expedited (1-5 mins) - $0.03 / GB and $0.01 per request
  - Standard (3-5 hours) - $0.01 per GB and 0.05 per 1000 requests
  - Bulk (5-12 hours) - $0.0025 per GB and $0.025 per 1000 requests

S3 Lifecycle Rules

Transition Actions: Defines when objects are transitioned to another storage class
Expiration Actions: Objects expire and are deleted
Can be used to delete incomplete multi-part uploads
Limit to prefix or tag
Can do current or previous versions

Snowball

Physically transport data in or out of AWS
TB or PB
Alternative to network fees
Secure, tamper resistant, uses KMS 256
Tracking using SNS and text messages, E-Ink shipping label
For: large data migrations, DC decommission, disaster recovery
If it takes more than a week via network use Snowball instead
Has client for copying files

Snowball Edge

Adds computational capability
100TB capacity, either:
- Storage Optimized - 24 vCPU
- Compute Optimized - 52 vCPU & optional GPU
- Supports a custom EC2 AMI so you can process while transferring
- Supports custom Lambda functions

AWS Snowmobile

Transfer exabytes (1EB = 1000PB = 1000000TB)
Each has 100PB of capacity, can use multiple in parallel
Use if transferring more than 10PB

Storage Gateway

Expose S3 on-premises
File Gateway
- S3 buckets via NFS and SMB (all S3 modes)
- Bucket access using IAM roles for each File Gateway
- Recently used data is cached
- Can be mounted on many servers
Volume Gateway
- Block storage using iSCSI backed by S3
- ^ Backed by EBS snapshots
- Cached volumes: low latency access to most recent data
- Stored volumes: entire dataset is on-premises, scheduled backups to S3
Tape Gateway
- VTL Virtual Tape Library backed by S3 and Glacier
- Back up data using existing tape based processes (and iSCSI interface)
- Works with most backup softwares

EBS

EBS volumes are AZ locked
Can migrate via snapshot and recreate
EBS backups use IO and shouldn't run during peak times
Root instances of EBS volumes get terminated with instance by default (can disable)
Disk IO is high - Increase EBS volume size (for gp2)
Size | Throughput | IOPS
GP2 (SSD): General purpose SSD (balance price/perf)
- Boot volumes, virtual desktops, low-latency interactive apps, development and test
- 1GB-16TB
- Small GP2 can burst IOPS to 3000 (anything under 3k can burst to 3k)
- Max IOPS is 16000
- 3 IOPS per GB, means at 5334 GB at max IOPS
IO1 (SSD): Highest-perf, low latency or high-throughput
- Critical business apps that require sustained IOPS, or more than 16000
- Mongo, Cassandra, MSSQL, MySQL, Oracle
- 4GB-16TG
- IOPS is provisioned 100-64000 (64k for Nitro only) else 100-32000
- Maximum ratio of provisioned IOPS to volume GB size = 50:1
ST1 (HDD): Low cost for frequently accessed, throughput-intensive workloads (big data)
- Streaming workloads requiring consistent, fast throughput at low price
- Big Data, DW, log processing, Kafka
- Cannot be boot volume
- 500GB - 16TB
- Max IOPS is 500
- Max throughput of 500 MB/s, can burst
SCI (HDD): Lowest cost for less frequently accessed workloads
- Throughput oriented for large volumes of data infrequently accessed
- Where lowest cost is important
- Cannot be a boot volume
- 500Gb - 16TG
- Max IOPS is 250
- Max throughput of 250 MB/s, can burst

Only GP2 and IO1 can be boot volumes
EC2 machine loses its root volume when terminated
Store non-ephemeral data on EBS volume, network drive (not physical) you can attach or detach while running. Persist data.
Locked to AZ
- Can move via snapshot
Have a provisioned capacity (billed for all capacity)
Can dynamically increase capacity over time, start small
Mount and mkfs

EBS Snapshots

Incremental - only changed blocks
EBS backups use IO, should not run them during peak times
Snapshots are stored in S3 (but you won't see them)
Don't have to detach volume but recommended
Max 100000 snapshots
Can copy across AZ or Region
Can make AMI from Snapshot
EBS volumes restored by snapshots need to be pre-warmed (using fio or dd to read entire volume)
Can be automated using Amazon Data Lifecycle Manager

EBS Migration

Volumes locked to AZ
To migrate, snapshot, (optional) copy volume to different region
Create a volume from the snapshot in the AZ of your choice

EBS Encryption

When you encrypt an EBS volume you get:
- Data at rest is encrypted inside the volume
- Data in flight between instance and the volume is encrypted
- Snapshots are encrypted
- As are volumes created from the snapshot
Encryption and decryption are transparent
Minimal impact on latency
EBS Encryption leverages keys from KMS (AES-256)
Copying an unencrypted snapshot allows encryption
Snapshots of encrypted volumes are encrypted
Encrypting an undecrypted EBS volume
- Create an EBS snapshot of the volume
- Encrypt the snapshot using copy
- Create a new volume from the snapshot
- Attach encrypted volume to original instance

EBS RAID

EBS is already redundant (replicated within an AZ)
But for increase of IOPS past max
Must do in OS not AWS
Or mirror EBS volumes
- RAID 0 (Perf, get combined disk space, IO, throughput, not fault tolerant)
- RAID 1 (mirror, send data to two* volumes at same time, 2x network traffic)
- RAID 5, 6 (Not recommended for EBS)

EFS

Managed NFS
EFS works with EC2 instances multi-AZ
Highly available, scalable, expensive (3xGP2), pay per use
For: content management, web serving, data sharing, WordPress
NFS v4.1
Use security groups to control access
Compatible with Linux based AMI (not windows)
Performance mode: General purpose (default), Max IO (used when 1000's of EC2 are using the EFS)
Has bursting or provisioned modes for IO
"EFS file sync" to sync from on-prem fs to EFS
Backup EFS-to-EFS (incremental, can choose frequency)
Encryption at rest using KMS
EFS now has lifecycle mgmt. to tier to EFS IA

Instance store

Some instance do not come with root EBS
Ephemeral
Physically attached to your instance
Pros
- Better I/O perf
- Good for buffer / cache / scratch data / temporary content
- Data survives reboot
On stop or termination instance store is lost
Can't resize the instance store
Backups must be operated by the user

Networking

Elastic IP public static IPv4 attachable to one instance
Horizontal scalability = elasticity
Vertical scalability (RDS, Elasticache)
HA means running your application in 2 DC/AZ

Load Balancing

Health Checks
- Done on port and route
Any LB has a static hostname, use it and not IP
LB can scale, not instant, contact AWS for a warm-up
4xx errors are client induced errors
5xx errors are application induced errors
LB 503 errors means at capacity or no registered target
If LB can't connect to app, check SG!
Seamlessly handle failures of downstream instances
Health checks (clb? 200 ok, otherwise not)
CLB + ALB support SSL Certificates and provide SSL termination for websites (NLB can terminate, Jan 2019)
Enforce stickiness
HA across AZs
Separate public traffic from private traffic
Exposes single point of access (DNS) to your app
Network Load Balancers expose a public static IP, whereas an Application or Classic Load Balancer exposes a static DNS (URL)
ELB - Managed load balancer
- Classic LB (v1, 2009)
  - Deprecated
- Application Load Balancer (v2, 2016)
  - Layer 7 (HTTP/S, WebSockets)
  - LB to multiple applications on same machine
  - LB to target group based on route in URL
  - LB to target group based on hostname in URL
  - LB to target group based on client IP
  - Supports dynamic host port mapping with ECS (redirect to same machine)
  - Before would have had to have one CLB per app
  - Stickiness at target group level (same instance)
    Cookie generated by ALB
  - App server does not see IP of client directly, inserted in X-Forwarded-For
    Also port via X-Forwarded_Port, and proto via X-Forwarded-Proto
  - Does do connection termination to do this
  - Great fit for ECS/Containers
- Network Load Balancer (v2, 2017)
  - TCP (Layer 4)
  - High perf, millions of requests per sec
  - Support static / elastic IP (per AZ), public must be elastic (can help whitelist by clients), private facing will get random private IP based on free ones at the time
  - Has cross zone balancing
  - Has SSL termination (Jan 2019)
  - Less latency ~100ms (vs 400ms for ALB)
  - Only for extreme perf, not default
  - NLB see client IP
- Can have internal or external ELB

LB Stickiness, enabled in Target Groups

Stickiness works for CLB and ALB
Works with cookies, has an expiration date
Make sure user doesn't lose session data
Can bring imbalance over backend instances
- Exam can ask if one instance is 80% and one 20% why that would be
Stickiness duration can be 1 sec to 7 days

LB SSL Certificates

LB uses x.509 certificate (SSL/TLS server cert) loaded on LB
Can manage certificates using ACM (AWS Certificate Manager)
Can create or upload your own certs alternatively
HTTPS listener
- Must specify default certificate
- Can add an optional list of certs to support multiple domains
- SNI (Server Name Indication) is a feature allowing you to expose multiple SSL certs if the client supports it.

Auto-Scaling Groups (ASG)

A launch configuration
- AMI + Instance Type
- EC2 User Data
- EBS Volumes
- Security Groups
- SSH Key Pair
Min/Max/Initial Capacity size
Network + Subnet information
Load Balancer Information
Scaling Policies (triggers)
Possible to scale in/out based on CloudWatch alarm
- Alarm monitors a metric
- Metrics are computed for the overall ASG instances
  - ex: Target average CPU
  - ex: Average network in or out
Can scale on custom metric (ex: connected users)
- Send custom metric from app on EC2 to CloudWatch (PutMetric API)
- Create alarm to react based on low / high values
- Use the alarm as scaling policy for ASG
IAM roles attached to an ASG will get assigned to EC2 instances
ASG are free, pay only for instances
ASG can terminate instances marked unhealthy by a LB and replace them
Available Metrics:
- ASGAverageCPUUtilization—Average CPU utilization of the Auto Scaling group.
- ASGAverageNetworkIn—Average number of bytes received on all network interfaces by the Auto Scaling group.
- ASGAverageNetworkOut—Average number of bytes sent out on all network interfaces by the Auto Scaling group.
- ALBRequestCountPerTarget—Number of requests completed per target in an Application Load Balancer target group.
Default Termination Policy for ASG. It tries to balance across AZ first, and then delete based on the age of the launch configuration.
Scaling Cooldown, makes sure doesn't get out of control, no other scaling takes effect until cooldown is over. Can override default cooldown.
Can have default cooldown, but also policy specific to simple scaling policy. Good for scale-in that terminates instances, doesn't take much time.
Reduce costs by lowering cooldown from ex: 300 to 180.
If your app is scaling multiple times per hour, modify ASG cool-down timer and the CloudWatch Alarm Period that triggers the scale-in

Security Groups

Inbound traffic is blocked by default, outbound is authorised
Can be attached to multiple instances, and instances can have multiple security groups
Locked to a region/VPC combination
Best practice use one just for SSH
If your application timeouts it's the SG
Can reference security group for access

Databases

RDS

Postgres
Oracle
MySQL
MariaDB
MS SQL
Aurora (proprietary)

DB Identifier (name) must be unique across region
Your responsibility
- Check IP / Port / SG inbound rules
- In-database user creation and permissions
- Creating database with or without public access
- Ensure parameter groups or DB is configured to only allow SSL
AWS Responsibility
- No SSH access
- No manual DB patching
- No Manual OS patching
- No way to audit underlying instance

For SAs

Read replicas can only do SELECT
RDS supports Transparent Data Encryption for Oracle or SQL Server
- Is on top of KMS, may affect performance
IAM Authentication vs un/pw for MySQL and PostgreSQL
- Lifespan of an IAM authentication token is 15 mins (short-lived), better security
- Tokens are generated by IAM credentials
- SSL must be used (or connection refused)
- Easy to use EC2 Instance Roles to connect to RDS DB (so don't need DB credentials in actual instance for non IAM)

Managed Service =
- OS patching
- Point in Time Restore backups
- Monitoring dashboards
- Read replicas for read perf
- Multi AZ set for DR
- Maintenance windows for upgrades
- Scaling (vert and horiz)
- BUT no SSH

RDS Read Replicas for scalability

Up to 5 Read Replicas
Within AZ, Cross AZ, or Cross Region
Replication is ASYNC (eventually consistent)
Replicas can be promoted to their own DB
Applications must updated connection string to leverage read replicas
- One string for master, 1 for each replica

Can combo Read Replicas and DR Multi AZ

RDS Multi AZ (Disaster Recovery)

SYNC replication
One DNS name for auto failover to standby
Increases availability (duh)
For AZ loss
No manual intervention
Not for scaling

RDS Backups

Automatically enabled
Automated Backups
- Daily full snapshot of DB
- Captures transaction logs in real
  - Ability to restore to any point in time
- 7 days retention (can increase to 35) (can lower as well)
DB Snapshots (can be manually triggered)
- Retention for as long as you want (keep specific state, or long term)

RDS Encryption

Encryption at rest with AES KMS - AES256 encryption
- Only at creation
- or: snapshot, copy as encrypted, create DB from snapshot (same as EBS)
SSL certificates to encrypt data in flight
To enforce SSL:
- PostgreSQL: rds.force_ssl=1 in the AWS RDS console (parameter groups)
- MySQL: Within the DB: GRANT USAGE ON . TO 'mysqluser'@'%' REQUIRE SSL;
To connect using SSL:
- Provide SSL Trust certificate (can be downloaded from AWS)
- Provide SSL options when connecting to DB

RDS Security

RDS DB are usually deployed in private subnet
Security works by leveraging security groups for who can communicate with it
IAM policies help control who can manage RDS
Traditional username and password to log into DB itself
IAM users now works with Aurora/MySQL

RDS vs. Aurora

Proprietary
Postgres and MySQL drivers supported
Cloud optimized - 5x perf for MySQL, 3x perf for Postgres
Automatically grows in increments of 10GB up to 64TB
Aurora can have 15 replicas, MySQL only 5, and replication is faster (sub 10ms lag)
Failover in Aurora is instantaneous, HA native.
Aurora costs 20% more than RDS, but is more efficient.

Aurora

Automatic failover
Backup and recovery
Isolation and security
Industry compliance
Push-button scaling
Automated patching with zero downtime
Advanced monitoring
Routine maintenance
Backtrack: restore data at any point in time without backups
HA and Read Scaling
- 6 Copies of data across 3 AZ
  - 4 copies out of 6 needed for writes
  - 3 copies out of 6 needed for reads
  - Self healing with peer-to-peer replication (for corrupted data)
  - Storage is striped across 100's of volumes
- One Aurora instance takes writes, Master
- Automated failover for master in less than 30 secs
- Master + up to 15 Read Replicas serve reads (any replica can become master)
- Support for Cross Region Replication
Shared logical storage volume across AZs for Replication + Self-Healing + Auto Expanding
Master is only writer
- Writer Endpoint (DNS name) always points to current master, for failover
- Read Replicas can do auto-scaling
  - Reader Endpoint Connection load balancing for reads, across all scaled instances. Happens at connection level not statement level.
  - ![Screen Shot 2019-11-18 at 14.10.27.png](../../../../_resources/Screen Shot 2019-11-18 at 14.10.27.png)

Aurora Security

Encryption at rest using KMS
Automated backups, snapshots and replicas are also encrypted
Encryption in flight using SSL (same process as MySQL or Postgres)
Authentication using IAM
You are responsible for protecting via SG
No SSH

Aurora Serverless

No need to choose an instance size
Only supports MySQL 5.6 & Postgres in beta
Helpful when you can't predict workload
DB cluster starts, shuts down, and scales automatically based on CPU / connections
Can migrate from Aurora Cluster to Serverless and vice versa
Serverless usage is measured in ACU (Aurora Capacity Units)
Billed in 5 minute increments of ACU
Some features aren't supported in serverless, so check docs

Aurora for SAs

Can use IAM for Aurora
Aurora Global Databases span multiple regions and enable DR
- One primary region
- One DR Region
- The DR region can be used for lower latency reads
- < 1 sec replication lag on average
If not using Global Databases you can create cross region Read Replicas
- FAQ recommends Global Databases instead

Elasticache

Managed in-memory DB, high perf, low latency.
Redis or Memcached
Reduce load on DB
Make app stateless (keep state in cache)
Write scaling using Sharding
Read scaling using Read Replicas
Multi AZ with Failover
AWS takes care of all normal stuff
App queries ElastiCache, either gets cache hit or cache miss, in case of miss it gets cached for hit next time
Cache must come with invalidation strategy for only most current data (app based)
User session store (keep it stateless)
- Application writes session data into ElastiCache
- User hits a different application instance
- Instance retrieves the data from cache to keep session going
Redis
- In-memory key-value store
- Super low latency (sub ms)
- Cache survives reboot by default (persistence)
- Multi AZ with automatic failover for DR (if you want to keep cache data)
- Support for Read Replicas and Cluster
- Good for: User sessions, Leaderboard (has a sort), Distributed states, Relive pressure on DB, Pub / Sub capability for messaging
Memcached
- In-memory object store
- Cache does not survive reboots
- Good for: Quick object retrieval, cache often accessed objects

ElastiCache for SAs

Security
- Redis supports RedisAUTH (un/pw)
- SSL in-flight must be enabled and used
- Memcached supports SASL
- None support IAM
- IAM policies are used only for AWS API level security
Patterns for ElastiCache
- Lazy Loading: all read data is cached, can become stale
- Write Through: Adds or updates data in the cache when written to DB (no stale data)
- Session Store: stores temp session data (using TTL features maybe)

DynamoDB

Fully managed, Highly Available with replication across 3 AZs
Scales to massive workloads, distributed database
Millions of request per second, trillions of rows, 100s of TB of storage
Fast and consistent in performance (low retrieval latency)
Integrated with IAM for security, authorization, and administration
Enables event driven programming with DynamoDB Streams
Low cost and auto scaling

Basics

DynamoDB is made of tables
Each table has a primary key (must be decided at creation)
Each table can have an infinite number of items (=rows)
Each item has attributes* (can be add over time, can be null, =columns)
Maximum item size = 400KB
Data types supported are:
- Scalar types: String, Number, Binary, Boolean, Null
- Document types: List, Map
- Set Types: String Set, Number Set, Binary Set

Table must be provisioned read and write capacity units
Read Capacity Units (RCU): throughput for reads ($0.00013 per RCU)
- 1 RCU = 1 strongly consistent read of 4KB per second
- 1 RCU = 2 eventually consistent read of 4KB per second
Write Capacity Unites (WCU): throughput for writes ($0.00065 per WCU)
- 1 WCU = 1 write of 1KB per second
Option to set up auto-scaling of throughput to meet demand
Throughput can be exceeded temporarily using "burst credit"
If burst credit are empty you'll get "ProvisionedThroughPutException"
Then do exponential back-off retry

DynamoDB - DAX

DynamoDB Accelerator
Seamless cache for DDB, no app re-write
WRites go through DAX to DynamoDB
Microsecond latency for cached reads and queries
Solves the Hot Key Problem (too many reads)
5 minute default TTL for cache
Up to 10 nodes in the cluster
Multi AZ (3 nodes minimum for production recommended)
Secure (Encryption at rest with KMS, VPC integration, IAM, CloudTrail, etc)

DynamoDB Streams

Changes in DynamoDB (Create, Update, Delete) can end up in a DynamoDB Stream
This stream can then be read by Lambda, then we can:
- React to changes in real time (welcome email to new users)
- Analytics
- Insert into ElasticSearch
- etc
Could implement cross region replication using Streams
Stream has 24 hours of data retention

![Screen Shot 2019-12-03 at 16.40.54.png](../../../../_resources/Screen Shot 2019-12-03 at 16.40.54.png)

New Features

Transactions
- All or nothing type operations
- Coordinated Insert, Update, Delete across multiple tables (all work or nothing)
- Include up to 10 unique items, or up to 4MB data
On Demand
- No capacity planning needed (WCU/RCU) - scales automatically
- 2.5x more expensive than provisioned
- Helpful when spikes are un-predictable or the app is very low throughput

Security and Other

Security
- VPC Endpoints, access without internet
- Fully controlled by IAM
- Encryption at rest with KMS, in transit with SSL/TLS
Backup and Restore available
- Point in time like RDS
- No performance impact
Global Tables (require Streams enabled)
- Multi region, fully replicated, high performance
DMS can be used to migrate to DDB from Mongo, Oracle, S3, etc
Can launch local version of DDB for dev purposes

Athena

Serverless service to perform analytics directly against S3 files
Uses SQL to query
Has a JDBC / ODBC driver
Charged per query and amount of data scanned
Supports CSV, JSON, ORC, Avro, and Parquet
For: BI, analytics, reporting, analyze VPC vlow logs, ELB logs, CloudTrail trails, etc.

Route 53

Most common records
- A: URL to IPv4
- AAAA: URL to IPv6
- CNAME: URL to URL (non root domain)
- Alias: URL to AWS resource (root and non-root), free of charge, supports native health checks
Can use
- Public domain names
- Private domain names that can only be resolved by your VPC instances
$0.50 per hosted zone

Has
- Load Balancing (through DNS, client LB)
- Health checks (limited)
- Routing policy: simple, failover, geolocation, latency, weighted, multi value
Simple Routing Policy
- Maps a domain to one URL
- Use when directing to a single resource
- Cannot attach health checks
- If multiple values are returned, a random one is chosen by client

Weighted Routing Policy
- Control % of requests that go to specific endpoint (ex: 70, 20, 10. Sum does not have to be 100)
- Create multiple record sets with weighted option
- Helpful to test 1% of traffic on new app
- Split traffic between regions
- Can be associated with health checks

Latency Routing Policy
- Redirect to server that has the least latency, close to request
- Evaluated in terms of user to designated AWS region
- Must specify region in latency record
- Germany could be directed to US if lower latency

Route 53 Geolocation Policy

Different from latency based
Based on user location
Traffic from England should go to X
Must have a default policy if no other match exists

Multi Value Routing Policy

Use when routing traffic to multiple instances
When want to associate a Route 53 health check with records, removes unhealthy from returned values
Up to 8 healthy records are returned for each MultiValue query (even if you have 50)
MultiValue is not a substitute for using ELB

Route 53 Health Checks

Will not send traffic to if failed
Deemed unhealthy if checks fail 3 times
Deemed healthy if checks pass 3 times
Default interval 30 secs (can set fast health check at 10s, higher cost)
About 15 health checkers will launch to check endpoint health
- one request every 2 secs on average
Can have HTTP, TCP, and HTTPS check (no SSL certificate verification)
Possibility of integrating health checks with CloudWatch
Health checks can be linked to Route 53 DNS record set

Route 53 as a Registrar

Offer both Registrar and DNS service

Developing on AWS

CLI

Never put personal credentials on EC2 machine, whole account compromised
Use Roles

Roles

Attached to EC2 instance
Come with policy authoring what instance is authorized for
Best practice
Instance can only have one role at a time

Policies

Permits and denies are specific APIs (GetObjet for Get*)
2nd Policy generator
AWS Policy Simulator
Inline Policies, added on top of, only for that role

AWS SDK

AWS CLI is a wrapper around Python SDK (boto3)
If you don't specify a region defaults to us-east-1
Recommended to use default credential provider chain
- Works with:
  - AWS credentials in .aws (local or on-prem)
  - Instance Profile Credentials using IAM Roles for EC2 machines etc.
  - Environment variables (AWS_ACCESS_KEY_ID, etc.), not often used
NEVER STORE CREDENTIALS IN YOUR CODE, abstract
Always use IAM Roles when working within AWS Services
Exponential Backoff
- Any API that fails because of too many calls needs to be retrieved with Exponential Backoff
- These apply to rate limited APIs
- The retry mechanism includ3d in SDK API calls
- 1 ms, 2 ms, 4ms, 8ms

CloudFormation

We didn’t specify a name in the json file for this bucket, so AWS names it with the [STACKNAME]-[LOGICAL_VOLUME_NAME]-[RANDOM_STRING] format.
What is logical volume name, based on resource in CFN?
Stacks have logical resources in them that create physical resources

CloudFront

Cached at edge locations
Popular with S3 but works with EC2 and LB as well
Helps with network attacks
Provides SSL (HTTPS) via ACM
Can use SSL (HTTPS) to talk internally to applications
Supports RTMP
Origin Access Identity
- Limit S3 to be only accessed via this identity

CloudFront Signed URL / Signed Cookies

To distrbute paid shared content which lives in S3
If S3 can only be accessed via CloudFront we can't use self-signed S3 URLs
Can attach a policy with:
- URL expiration
- IP ranges for access
- Trusted signers (which AWS Account can create signed URLs)
CloudFront signed URLs can only be created using the AWS SDK
Validity length?
- Share content, movies etc, short = few minutes
- Private content (to user) longer = years

CloudFront vs S3 Cross Region Replication

CloudFront
Global Edge network
Files are cached for a TTL (maybe a day)
Great for static content that must be available everywhere
S3 Cross Region Replication
- Must be set up for each region
- Files are updated near real-time
- Read only
- Great for dynamic content that needs low-latency in a few regions

CloudFront Geo Restriction

Restrict who can access your distribution
- Whitelist by country
- Blacklist by country
Country is determined by usnig 3rd party Geo-IP database
Copywrite law etc.

Messaging

General

Two patterns of application communication
- Synchronous (app to app)
  - Problematic if there are suddent spikes of traffic
- Asynchronous / Event Based (Queue)
  - Better to decouple (SQS: Queue, SNS: Pub/Sub, Kinesis: real-time streaming)

SQS (Super important)

SQS Standard Queue

Publisher -> Queue -> Consumer
Fully managed
Scales from 1 message per second to 10000s per second
Default retention: 4 days, maximum 14 days
No limit to how many messages in queue
Low latency (<10ms on publish and receive)
Horizontal scaling in terms of number of consumers
Can have duplicate messages (at least once delivery. Occasionally)
Can have out of order messages (best effort ordering)
Limitation of 256KB per message

SQS Delay Queue

Delay a message up to 15 minutes (consumers don't see it immediately)
Default os 0 second (available right away)
Can set a default at queue level
Can override the default using the DelaySeconds parameter, queue holds it

Producing Messages

Define Body (String up to 256KB)
Metadata, message attributes (optional) of Key Value pair, with Type
Provide Delay Delivery
Get Back
- Message identifier
- MD5 hash of the body

Consuming Messages

Poll SQS for messages (receive up to 10 at a time)
Process the message within the Visbility Timeout
Delete the message fro mthe queue using the message ID and receipt handle

Visibility Timeout

When a consumer polls a message from a queue the message is then "invisible" to other consumer for the defined Visibility Timeout perdiod
- Set between 0 seconds and 12 hours (default 30 secs)
- If too high (15 mins) and consumer fails to process, you have to wait a long time before retry
- If too low (30 secs) and consumer needs more time to process another consumer will receive the message and it will be processed more than once
ChangeMessageVisibility API to change the visbility while processing a message, consumer can alert SQS it needs more time
DeleteMessage API to tell SQS the message was successfully processed

Dead Letter Queue

If a consumer fails to process a message within the Visibility Timeout it goes back to the queue
We can set a threshold of how many times a message can go back, it's called a redrive policy
After that threshold is exceeded the message goes into the Dead Letter Queue (DLQ)
We have to create a DLQ first, then designate it as a DLQ
We must make sure to process messages in the DLQ before they expire

Long Polling (Receive Message Wait Time)

When a consumer requests messages from the queue it can optionally "wait" for messages to arrive if there are none
LongPolling decreases the number of API calls made to SQS while increasing efficiency and latency of the app.
The wait time can be between 1 - 20 seconds, 20 preferable
Long Polling is preferred to Short Polling
Long Polling can be enabled at the queue level, or at the API level when making the poll via WaitTimeSeconds

FIFO Queue

Name of the queue must end in .fifo
Lower throughput (up to 3000 per sec with batching, 300/s without)
Messages are processed in order by the consumer
Messages are sent exactly once
No per message delay (only per queue delay)
Ability to content based de-duplication
5 minute interval de-duplication using "Duplication ID"
Message Groups:
- Possibility to group messages for FIFO ordering using "Message GroupID"
- Only one worker can be assigned per message group, so message are processed in order
- Message group is just an extra tag on the message

SNS

Event producer only sends one message to the SNS topic
As many event receivers (subscriptions) as you want can listen to the SNS topic notifications
Each subscriber will get all the messages (new feature to filter messages)
Up to 10,000,000 subscriptions per topic
100,000 topic limit
Subscribers can be:
- SQS
- HTTP/S (with delivery retries)
- Lambda
- Emails
- SMS messages
- Mobile notifications

SNS Integrations

Some services can send data directly to SNS for notifications
CloudWatch for alarms
Auto Scaling Groups notifications
Amazon S3 on bucket events
CloudFormation upon state changes
etc

How to publish

Must process message right away, not stored in SNS Topic
Topic Publish (Within your AWS server using the SDK or CLI)
- Create a topic
- Create a subscription (or many)
- Publish to the topic
Direct Publish (for mobile apps SDK) (Not on exam)
- Create a platform application
- Create a paltform endpoint
- Publish to the platform endpoint
- Work with Google GCM, Apply APNS, Amazon ADM

SNS + SQS - Fan Out

Push once in SNS, receive in many SQS
Fully decoupled
No data loss
Ability to add receivers of data later, flexible
SQS allows for delayed processing and retries of work (implying SNS does not)
Can have many workers on one queue and one worker on the other, or whatever

![Screen Shot 2019-12-02 at 18.37.22.png](../../../../_resources/Screen Shot 2019-12-02 at 18.37.22.png)

SNS Protocols
- HTTP/S
- Email
- Email-JSON
- Amazon SQS
- AWS Lambda

Kinesis

Managed alternative to Kafka
Data is automatically replicated to 3 AZ
Great for application logs, metrics, IoT, clickstreams
Great for "real-time" big data
Great for real-time streaming processing frameworks (Spark, NiFi, etc)
Kinesis Streams (just plain Kinesis): low latency streaming ingest at scale
Kinesis Analytics: perform real-time analytics (filters, computations, alerting, etc) on streams using SQL
Kinesis Firehose: load streams into S3, Redshift, ElasticSearch, etc

![Screen Shot 2019-12-02 at 18.55.26.png](../../../../_resources/Screen Shot 2019-12-02 at 18.55.26.png)

Kinesis Streams (important)

Streams are divided in ordered Shards / Partitions
Data retention is 1 day by default, up to 7 days (24-168 hours)
Ability to reprocess / replay data (unlike SQS)
Multiple applications can consume the same stream (like SNS)
Real-time processing with a scale of through put (add more shards)
Once data in inserted into Kinesis it can't be deleted (immutability)
Think of a shard as a little queue
Kinesis is a highway, want to get the data to destination ASAP

![Screen Shot 2019-12-02 at 19.08.07.png](../../../../_resources/Screen Shot 2019-12-02 at 19.08.07.png)

Shards

One stream is made up of many different shards
Write: 1MB/s or 1000 messagess at write side PER SHARD
Read: 2MB/s at read side PER SHARD
Billing is per shard provisioned, can have as many as you want
Batching available for message push or for message calls
The number of shards can evolve over time (reshard / merge, essentially autoscaling)
Records are ordered per shard (SQS is unordered, fifo one queue, kineses in-between)

Kinesis API - Put records

On producer side
PutRecord API + partition key (any string) that gets hashed to determine shard id
The key is a way to route data to a specific shard
The same key goes to the same partition (data only goes to one shard at a time)
Messages sent get a sequence number
Choose a partition key that is highly distributed (helpes prevent "hot parition", overused shard)
- Good user_id if many users
- Bad country_id if most users are from same country
Use batching and PutRecords to reduce costs and increase throughput
ProvisionedThroughputExceeded if we go over the limits, then use Retries or ExponentialBackoff
Can use CLI, SDK, or producer libraries from various frameworks

Kinesis API - Exceptions

ProvisionedThroughputExceeded exceptions
- Happens when sending too much data
- Make sure you don't have a hot shard
Solution
- Retries with backoff
- Increase shards (scaling)
- Ensure your partition key is a good one

Kinesis API - Consumers

Can use a normal consumer (CLI, SDK, etc)
Can use Kinesis Client Library (in java, Node, Python, Ruby, .Net)
- KCL uses DynamoDB to checkpoint offsets
- KCL uses DynamoDB to track other workers and share the work amongst shards (to improve efficiency)

Kinesis Security

Control access and authorization via IAM policies
In-Flight using HTTPS endpoints
At rest with KMS
Can encrypt/decrypt client side (difficult)
VPC endpoints available for Kinesis to access within VPC (no internet access)

Kinesis Data Analytics

Perform real-time analytics on Streams using SQL
Kinesis Data Analytics
- Autoscaling
- Managed
- Continuous (real-time, no delay)
Pay for actual consumption rate
Can create new streams out of the real-time queries

Kinesis Firehose

Fully managed, no administration
Near real-time (perhaps 60 secs)
Load data into Redshift, S3, ElasticSearch, Splunk (ETL)
Autoscaling
Support for many data formats (but pay for conversion)
Pay for data going through, consumption model

Only one consumer per shard for Kinesis

![Screen Shot 2019-12-02 at 20.34.47.png](../../../../_resources/Screen Shot 2019-12-02 at 20.34.47.png)

Amazon MQ

SQS and SNS are cloud-native, using proprietary protocols from AWS
Traditional on-premises apps may use open protocols like: MQTT, AMQP, STOMP, Openwire, WSS
When migrating to cloud instead of re-engineering we can use Amazon MQ
Amazon MQ = managed Apache ActiveMQ
Amazon MQ doesn't scale as much
Runs on a dedicated machine, can run in HA multi-AZ
Has both a Queue feature (SQS) and topic feature (SNS)

Serverless

Just deploy functions (FaaS)

Lambda & Step Functions
DynamoDB
Cognito
API Gateway
S3
SNS & SQS
Kinesis
Aurora Serverless

Lambda

Virtual functions
Limited by time - short executions, when done, done
Run on-demand (run in ms)
Scaling is automated

Easy pricing
- Pay per request and compute time
- Free tier has 1,000,000 requests and 400,000 GBs of compute time
Integrated with whole AWS Stack
Integrated with many programming languages
Easy monitoring through AWS CloudWatch
Easy to get more resources for your functions (up to 3GB of ram)
Increasing RAM also improves CPU and network

Node.js (javascript), Python, Java (v8 compatible), C# (.NET Core), Golang, C# / Powershell

Main integrations
- API GW
- Kinesis
- DynamoDB
- S3
- IoT
- CloudWatch Events and Logs
- SNS
- Cognito
- SQS

Pricing

Pay per *calls
- First 1,000,000 are free
- $0.20 per 1 million thereafter
Pay per duration (100ms increments)
- 400,000 GB-seconds of compute time free per month
- == 400,000 seconds if function is 1GB RAM
- == 3,200,000 seconds is function is 128MB RAM
- After that $1.00 for 600,000 Gb-s

Lambda Configuration

Timeout: default of 3 secs, max of 900s (15min)
Environment variables
Allocated memory (128M to 3G)
Ability to deploy within a VPC and assign security groups
IAM execution role must be attached to the Lambda function

Limits (exam)

Execution
- Memory allocation: 128MB - 3008 MB (in 64MB increments)
- Maximum execution time: 300s (5 minutes), now 15 but exam assumes 5
- Disk capacity in the "function container" (in /tmp): 512MB
- Concurrency limits: 1000 (can service ticket)
Deployment:
- Function deployment size (compressed .zip): 50MB
- Uncompressed deployment (code+dependencies): 250MB
- Can use /tmp dir to load other files at startup (for more than 250MB)
- Size of environment variables: 4KB (therefore can't pass file)

Lambda @ Edge

Have a CloudFront CDN
@Edge allows you tu run global Lambda alongside
Or do request filtering before reaching application
Global as opposed to a region
More responsive apps
Customize CDN content
Pay per use

Use Lambda to change CloudFront requests and responses
- After CloudFront receives a request from a viewer (viewer request)
- Before CloudFront forwards the request to the origin (origin request)
- After CloudFront receives the response from the origina (origin response)
- Before CloudFront forwards the response to the viewer (viewer response)

![Screen Shot 2019-12-03 at 14.16.13.png](../../../../_resources/Screen Shot 2019-12-03 at 14.16.13.png)

You can also generate responses to viewers without ever sending the request to the origin

![Screen Shot 2019-12-03 at 14.19.09.png](../../../../_resources/Screen Shot 2019-12-03 at 14.19.09.png)

Use Cases
- Website Security and Privacy
- Dynamic Web Application at the Edge (see above pic)
- SEO
- Intelligently route across Origins and Data Centers
- Bot mitigation at Edge
- Real-time image transformation
- A/B Testing
- User authentication and authorization
- User Prioritization
- User Tracking and Analytics

API GW

AWS Lambda + API Gateway: No infra to manage
Handle API versioning (v1, v2, etc)
Handle different environments (dev, test, prod)
Handle security (Authentication and Authorization)
Create API keys, handle request throttling
Swagger / Open API import to quickly define APIs
Transform and validate requests and responses
Generate SDK and API specifications
Cache API responses
Stage variables allow you to modularize your stages, different for dev or prod for example

Integrations

Outside of VPC
- Endpoints on EC2
- Load Balancers
- Any AWS service
- External and publicly accessible HTTP endpoints
Inside of VPC
- AWS Lambda in your VPC
- EC2 endpoints in your VPC

Security (exam)

IAM Permissions
- Create an IAM policy authorization and attach to application User/Role
- API GW verifies IAM permissions passed by calling the application
- Good to provide access within your own infra, but not for outside
- Leverages Sig v4 capability where IAM credentials are in headers

Lambda/Custom Authorizer
- Uses Lambda to validate the token passed in the header
- Option to cache the results of authentication
- Helps to use OAuth / SAML / 3rd party type of authentication
- Lambda must return an IAM policy for the user

![Screen Shot 2019-12-04 at 12.32.53.png](../../../../_resources/Screen Shot 2019-12-04 at 12.32.53.png)

Cognito User Pools
- Cognito fully manages user lifecycle
- API GW verifies identity autmatically from AWS Cognito
- No custom implementation required
- Cognito only helps with authentication, not authorization

![Screen Shot 2019-12-04 at 12.37.55.png](../../../../_resources/Screen Shot 2019-12-04 at 12.37.55.png)

Summary
IAM
- Great for users / roles already within your AWS account
- Handle authentication + authorization
- LEverages Sig v4
Custom Authorizer (Lambda)
- Great for 3rd part tokens
- Very flexible in terms of what IAM policy is returned
- Handle authentication + authorization
- Pay per Lambda invocation (but can cache to save calls)
Cognito User Pool
- You manage your own user pool (non-IAM) (can be backed by Facebook, Google login, etc)
- No need to write custom code
- Must implement authorization on the backend

Cognito

Gives users an identity so that they can interact with our application
Cognito User Pools
- Sign in functionality for app users
- Integrate with API GW
Cognito Identity Pools (Federated Identity)
- Provide AWS credentials to users so tehy can access AWS resources directly
- Integrate with Cognito User Pools as an identity provider
Cognito Sync (being replaced by AppSync)
- Synchronize data from device to Cognito

Cognito User Pools (CUP) (app authentication)
- Create a serverless database of users for your mobile apps
- Simple login: Username (or email) / password combination
- Possibility to verify emails / phone number and add MFA
- Can enable Federated Identities (Facebook, Google, SAML, etc)
- Sends back a JSON Web Token (JWT)
- Can be integrated with API GW for authentication

![Screen Shot 2019-12-04 at 12.49.24.png](../../../../_resources/Screen Shot 2019-12-04 at 12.49.24.png)

Cognito Federated Identity Pools (AWS IAM access)
- Goal:
  - Provide direct access to AWS resources from the client side
- How:
  - Log in to federated identity provider - or remain anonymous
  - Get temporary AWS credentials back from the Federated Identity Pool
  - These credentials come with a pre-defined IAM policy stating their permissions
- Example:
  - Provide temporary access to write to a S3 buck using Facebook Login

![Screen Shot 2019-12-04 at 12.53.33.png](../../../../_resources/Screen Shot 2019-12-04 at 12.53.33.png)

Cognito Sync (deprecated, now AppSync)
- Store preferences, configuration, state of app
- Cross device (any platform - iOS, Android, etc)
- Offline capability (synchronization when back online)
- Requires Federated Identity Pool in Cognito (not User Pool)
- Store data in datasets (up to 1MB)
- Up to 20 datasets to synchronize

Serverless Solution Architecture

Rewatch Section

S3 Transfer acceleration, upload hits CloudFront which puts to S3

![Screen Shot 2019-12-05 at 14.55.35.png](../../../../_resources/Screen Shot 2019-12-05 at 14.55.35.png)

![Screen Shot 2019-12-05 at 14.57.23.png](../../../../_resources/Screen Shot 2019-12-05 at 14.57.23.png)

Microservices
You are free to design each micro-service the way you want
Synchronous patterns: API GW, LB
Asynchronous patterns: SQS, Kinesis, SNS, Lambda triggers (S3)
Challenges with microservices
- Repeated overhead for creating each new microservice
- Issues with optimizing server density/utilization
- Complexity of running multiple versions of multiple microservices simultaneously
- Proliferation of client-side code requirements to integrate with many seprate services
Some of the challenges are solved by Serverless patterns
- API GW and Lambda scale automatically and you pay per usage
- You can easily clone APIs to reproduce environments
- Generated client SDK through Swagger integration for the API gateway

![Screen Shot 2019-12-05 at 15.20.53.png](../../../../_resources/Screen Shot 2019-12-05 at 15.20.53.png)

![Screen Shot 2019-12-05 at 15.27.44.png](../../../../_resources/Screen Shot 2019-12-05 at 15.27.44.png)

Database Comparison

Questions to choose the right database based on your architecture
- Read heavy, write heavy, balanced workload? Throughput needs? Will it change, does it need to scale or fluctuate during the day?
- How much data to store and for how long? Will it grow? Average object size?
- Data durability (week, years)? Source of truth for the data?
- Latency requirements? Concurrent users?
- Data model? How will you query the data? Joins? Structured? Semi-structured?
- Strong schema? More flexibility? Reporting? Search? RDBMS / NoSQL?
- License costs? Switch to Cloud Native DB such as Aurora?

Database Types
- RDBMS (= SQL/OLTP): RDS, Aurora - great for joins
- NoSQL: DyamoDB (~JSON), ElasticCache (key/value pairs), Neptune (graphs) - no joins, no SQL
- Object Store: S3 (for big objects), Glacier (backups /archives)
- Data Warehouse (=SQL Analytics / BI): Redshift (OLAP), Athena
- Search: ElasticSearch (JSON) - free text, unstructured searches
- Graphs: Neptune - displays relationship between data

RDS Overview
- Managed PostgreSQL / MySQL / Oracle / SQL server
- Must provision an EC2 instance and EBS volume type and size
- Suport for Read Replicas and Multi AZ
- Security through IAM, Security Groups, KMS, SSL in transit
- Backup / Snapshot / Point in time restore
- Managed and Scheduled maintenance
- Monitoring through CloudWatch
- Use Case: Store relational datasets (RDBMS / OLTP), perform SQL queries, transactional inserts / update / delete available
RDS for Solutions Architect (WAF)
- Operations: small downtime for failover and maintenance, scaling with read replicas and EC2 type, restore EBS implies manual intervention, application changes must be done for changes
- Security: AWS responsible for OS security, we are responsible for setting up KMS, security groups, IAM policies, authorzing users in DB, using SSL
- Reliability Multi AZ feature, failover in case of failures
- Performance: depends on EC2 instance type, EBS volume type, ability to add Read Replicas. Doesn't auto-scale
- Cost: Pay per hour based on provisioned EC2 and EBS

Aurora Overview
- Compatible API for PostgreSQL and MySQL
- Data is held in 6 replicas, across 3 AZ
- Auto-healing capability
- Multi-AZ, Auto-Scaling Read Replicas
- Read Replicas can be Global
- Aurora database can be Global for DR or latency purposes
- Auto-scaling of storage from 10GB to 64TB
- Define EC2 instance type for Aurora, but changeable
- Same security / monitoring / maintenance features as RDS
- "Aurora Serverless" option
- Use case: Same as RDS but with less maintenance / more flexibility / more performance

Operations: less operations, auto-scaling storage
Security: AWS responsible for OS security,we are responsible for setting up KMS, security groups, IAM policies, authorizing users in DB, using SSL
Reliability: Multi AZ, HA, possibly more than RDS (6 data copies), Aurora Serverless option
Performance: 5x performance due to architectural optimizations, up to 15 read replicas (5 for RDS)
Cost: Pay per hour bsed on EC2 and storage usage. Possibly lower costs compared to things like Oracle

ElastiCache Overview
- Managed Redis / Memacached (same offering as RDS but caches)
- In-memory data store, sub-millisecond latency
- Must provision and EC2 instance type
- Support for Clustering (redis) and Multi AZ, Read Replicas (Sharding)
- Security through IAM, Security Groups, KMS, Redis Auth
- Backup, Snapshot, Point in time restore
- Managed and scheudled maintenance
- Monitoring through CloudWatch
- Use case: Key/Value store, frequent reads, less writes, cache results for DB queries, store session data for websites, cannot use SQL (retrive by key not query)

Operations: Same as RDS
Security: AWS responsible for OS security, we for KMS, security groups, users (Redis Auth), using SSL
Reliability: Clustering, Multi AZ
Performance: Sub-millisecond performance, in memory, read replicas for sharding
Cost: Pay per hour based on EC2 and storage usage

DynamoDB Overview
- AWS proproetary technology, managed NoSQL
- Serverless, provisioned capacity, auto-scaling, on demand capacity (Nov 2018)
- Can replace ElastiCache as a key/value store (storing session data for ex)
- HA, Multi AZ by default, Read and Writes are decoupled, DAX for read cache
- Reads can be eventually consistent or strongly consistent
- Security, Authentication, and Authorization is done through IAM
- DynamoDB Streams to integrate with Lambda (on any DB change)
- Backup / Restore feature, Point in Time (35 days), GlobalTable feature (requires DDB Streams enabled)
- Monitoring through CloudWatch
- **Can only query on primary key, sort key, or indexes **
- Use case: Serverless application development (small docs 100s KB), distributed serverless cache, doesn't have SQL query language available, has transactions capability from Nov 2018

Operations: No operations needed, auto-scaling capability, serverless
Security:Full security through IAM policies, KMS encryption, SSL in flight
Reliability: Multi AZ, Backups, Point in Time
Performance:Single digit millisecond performance, DAX For sub caching reads, performance doesn't degrade if app scales
Cost: Pay per provisioned capacity and storage usage, no need to gues (can use auto-scaling)

S3 Overview
- S3 is a key / value store for objects
- Great for big objects, not so great for small objects
- Serverless, scales infinitely, max object size is 5TB
- Eventually consistent for overwrites and deletes
- Tires: S3 Standard, S3 IA, S3 One Zone IA, Glacier for backups
- Features: Versioning, Encryption, Cross Region Replicated, etc...
- Security: IAM, Bucket Policies, ACL
- Encryption: SSE-S3, SS-KMS, SSE-C, client side encryption, SSL in transit
- Use case: Static files, key value store for big files, website hosting

Operations: No operations
Security: IAM, Bucket Policies, ACL, Encryption, SSL
Reliability: %99.999999999 durability, %99.99 availability, Multi AZ, CRR
Performance: Scales to thousands of read / writes per second, transfer acceleration (CloudFront) / multi-part upload for big files
Cost: Pay per storage used, network cost, requests number

Athena
- Fully serverless database with SQL capabilities
- Used to query data in S3
- Pay per query
- Output results back to S3
- Secured through IAM

Operations: No operations, serverless
Security: IAM + S3 security
Reliability: Managed service, uses Presto engine, HA
Performance: Queries scale based on data size
Cost: Pay per query / per TB of data scanned, serverless

Redshift
- Redshift is based on PostgreSQL, but it's not used for OLTP
- It's OLAP - online analytical processing (analytics and data warehousing)
- 10x better perf than other data warehouses, scale to PBs
- Columnar storate of data (instead of row based)
- Massively Parallel Query Execution (MPP), HA
- Pay as you go based ont he instances provisioned
- Has a SQL interface for performing the queries
- BI tools such as Quicksight or Tableau integrate with it
- Data is loaded from S2, DynamoDB, DMS, other DBs...
- From 1 to 128 nodes, up to 160GB of space per node
- Leader node: for query planning, results aggregation
- Compute node: for performing the queries, send the results to leader
- Redshift Spectrum: perform queries directly against S3, no need to load
- Backup & restore, Security VPC / IAM / KMS / Monitoring
- Redshift Enahnced VPC Routing: COPY & UNLOAD goes through VPC not internet

Operations: Similar to RDS
Security: IAM, VPC, KMS, SSL (similar to RDS)
Reliability: HA (cluster), auto-healing feautres
Performance: 10x perf, compression
Cost: Pay per node provisioned, 1/10th cost of others

Neptune
- Fully managed graph database
- For:
  - High relationship data
  - Social networking
  - Knowledge graphs (Wikipedia)
- Highly available across 3 AZ, with up to 15 read replicas
- Point in time recovery, continuous backup to Amazon S3
- Support for KMS and HTTPS

Operations: Similar to RDS (must provision instance)
Security: IAM, VPC, KMS, SSL, IAM Authentication
Reliability: Multi AZ, clustering
Performance: Best suited for graphs, clustering to improve perf
Cost: Pay per node provisioned

ElasticSearch
- Examnple: In DDB you can only find by primary key or index created on top
- With ElasticSearch you can *search any field, even partials
- It's common to use ElasticSearch as a complement to another DB (for website search as example)
- ElasticSearch also has Big Data application usage
- You can provision a cluster of instances
- Built-in integrations for ingestion: Kinesis Firehose, IOT, Cloudwatch logs
- Security through cognito & IAM, KMC, SSL, VPC
- Comes with Kibana (visualization) & Logstash (log ingestion) = ELK Stack

Operations: Similar to RDS
Security: Cognito, IAM, VPC, KMS, SSL
Reliability: Multi AZ, clustering
Performance: Petabyte scale
Cost: Pay per node provisioned
= Search / indexing

AWS Monitoring

CloudWatch

CloudWatch provides metrics for every service in AWS
Metric is a variable to monitor (CPUUtilization, NetworkIn, etc)
Metrics belong to namespaces
Dimension is an attribute of a metric (instance id, environment, etc)
Up to 10 dimensions per metric
Metrics have timestamps
Can create a CloudWatch dashboard of metrics

Detailed Monitoring

EC2 instance metrics have metrics every 5 minutes
With detailed monitoring (for a cost) you get data every 1 minute
Use detailed monitoring for more effective ASG scaling
Free Tier allows up to 10 detailed monitoring metrics
EC2 Memory usage is not pushed by default, msut be pushed from inside the instance

CloudWatch Custom Metrics

Possibility to define and send your own custom metrics to CloudWatch
Ability to use dimensions (attributes) to segment metrics
- Instance.id
- Environment.name
Metric resolution:
- Standard: 1 minute
- High resolution: Down to 1 second (StorageResolution API parameter) - Higher Cost
- Use API call PutMetricData
- Use exponential back off in case of throttle errors
Available metrics
- ASGAverageCPUUtilization—Average CPU utilization of the Auto Scaling group.
- ASGAverageNetworkIn—Average number of bytes received on all network interfaces by the Auto Scaling group.
- ASGAverageNetworkOut—Average number of bytes sent out on all network interfaces by the Auto Scaling group.
- ALBRequestCountPerTarget—Number of requests completed per target in an Application Load Balancer target group.

CloudWatch DashBoards

Great way to set up dashboards for quick access to key metrics
Dashboards are global, go to each region to set up, but see anywhere
Dashboards can include graphs from different regions
You can change the time zone & time range of the dashboards
You can set up automatic refresh (10s, 1m, 2m, 5m, 15m)
Pricing:
- 3 Dashboards (up to 50 metrics) for free
- $3/dashbaord/month afterwards

CloudWatch Logs

Applications can send logs to CloudWatch via the SDK
CloudWatch can collect logs from:
- Elastic Beanstalk: Collects from application
- ECS: Colelcts from containers
- Lambda: Collects from functions
- VPC Flow Logs
- API Gateway
- CloudTrail based on filter
- CloudWatch Logs Agents: For example on EC2 machines
- Route53: Logs DNS queries
CloudWatch logs can go to:
- Batch exporter to S3 for archival
- Stream to ElasticSearch cluster for further analytics

Log storage architecture:

Log Groups: Arbitary name, usually representing an application
Log Stream: instances within application / log files / containers (A log stream is a sequence of log events that share the same source)
Can define log expiration policies (never expire, 30 days, etc)
Using the CLI we can tail CloudWatch logs
To send logs to CloudWatch, make sure IAM permissions are correct!
Security: Encryption of logs using KMS at the Group level

CloudWatch Logs Metric Filter & Insights

CloudWatch Logs can use filter expressions
- For example, find a specific IP inside a log
- Metric filters can be used to trigger alarms (found specific IP, then alarm)
CloudWatch Logs Insights can be used to query logs, and add queries to CloudWatch Dashboards (comes withe some default)

CloudWatch Alarms

Alarms are used to trigger notifications for any metric
Alarms can go to Auto Scaling, EC2 Actions, SNS Notifications
Various options (sampling, %, max, min, etc)
Alarm States:
- OK
- INSUFFICIENT_DATA
- ALARM
Period:
- Length of time in seconds to evalute the metric
- High resolution custom metrics: can only choose 10 sec or 30 sec

CloudWatch Events

Schedule: Like a cron job (same format)
Event Pattern: Event rules to react to a service doing something (Ex: CodePipeline state changes)
Triggers to Lambda functions, SQS/SNS/Kinesis Messages
CloudWatch Event creates a small JSON document to give info on the change

CloudTrail

Provides governance, compliance, and audit for your account
Enabled by default
Get a history of events / API calls made within your account by:
- Console
- SDK
- CLI
- AWS Services
Can put logs from CloudTrail into CloudWatch logs
If a resource is delted, look into CloudTrail first

Security

Encryption in Flight

Ensures no MITM

Encryption at Rest

Data is encrypted after being received by server
Data is decrypted before being sent
The encryption / decryption keys (data key) must be managed somewhere and the server must have access to it

Client Side encryption

Data is encrypted by client, never decrypted by server
Data will be decrypted by a receiving client
The server should not be able to decrypt the data
Could leverage Envelope Encryption

KMS (Key Management Service)

Fully integrated with IAM for authorization
Seamlessly integrated into most AWS services (EBS, S3, Redshift, SSM, etc)
But you can also use the CLI / SDK

Any time you need to share sensitive information, use KMS
- DB PW
- Credentials to external sercive
- Private Key of SSL certs
The Customer Master Key (CMK) used to encrypt data can never be retrieved from KMS by the user, and it can be rotated for extra security
Never store secrets in plaintext, especially in code
Encrypted secret can be stored in code / environment variables
KMS can only help in encrypting up to 4KB of data per call: PW, SSL cert, credentials, etc
If data > 4KB use envelope encryption
To grant KMS access to someone:
- Make sure the Key Policy allows the user
- Make sure the IAM Policy allows the API calls

KMS makes you able to fully manage the keys & policies: (although we cannot ever see the keys ourselves)
- Create
- Rotation policies
- Disable
- Enable
Able to audit key usage (using CloudTrail)
Three types of CMK
- AWS Managed Service Default CMK: free
- User Keys created in KMS: $1 / month
- User Keys imprtoed (must be 256-bit symmetric key): $1 / month
- - pay for API calls to KMS: $0.03 / 10000 calls

Encryption in AWS Services

Requires migration (through Snapshot / Backup)
- EBS Volumes
- RDS databases
- ElastiCache
- EFS network file system
In-place encryption
- S3

AWS Paramter Store

Secure storage for configuration and secrets
Optimal Seamless Encryption using KMS
Serverless, scalable, durable, easy SDK, free
Version tracking of confgiurations / secrets
Configuration management using path and IAM
Notifications with CloudWatch Events
Integration with CloudFormation
Simplifies workflow vs KMS

Parameter Store Hierarchy

/my-department/
- my-app/
  - dev/
    db-url
    db-password
  - prod/
    db-url
    db-password
- other-app/
/other-dept/
Can have encrypted or plaintext parameters
In System Manager - Applciation Mgmt, or CLI
GetParameters API via Lambda/SDK function or
GetParametersByPath API

STS - Security Token Service

Allows granting limited and temporary access to AWS resources
Token is valid for up to 1 hour (must be refreshed)
Cross Account Access
- Allows users from one AWS account access to resources in another
Federation (Active Directory)
- Provides a non-AWS user with temporary AWS access by linking user's AD credentials
- Uses SAML
- Allows Single Sign On (SSO) which enables users to log in to AWS console without assigning IAM credentials
Federation with third party providers / Cognito
- Used mainly in web and mobile apps
- Makes use of FB/G/Amazon etc to federate them

Cross Account Access

Define an IAM Role for another account to access
Define which accounts can access this IAM Role
Use AWS STS to retrieve credentials and impersonate the IAM Role you have access to (AssumeRole API)
Temproary credentials can be valid between 15 minutes to 1 hour

Identity Federation with AD and Cognito

Federation lets users outside of AWS assume a temporary role for accessing AWS resources
These users assume an identity provided access role
Federation assumes a form of 3rd party authentication
- LDAP
- MS AD (~=SAML)
- Single Sign On
- OpenID
- Cognito
Using federation you don't need to create IAM users (user mgmt is outside AWS)

SAML Federation (for Enterprise)

To integrate AD / ADFS with AWS (or any SAML 2.0)
Provides access to AWS Console or CLI (through temporary credentials)
No need to create an IAM user for each employee

Custom Identity Broker App (for Enterprise) (no SAML 2.0)

Use only if the identity provider is not compatible with SAML 2.0
You must code your own identity broker which must determine the appropriate IAM policy

Cognito - Federated Identity Pools (For Public Applications)

Goal:
- Provide direct access to AWS Resources from the client side
How:
- Log in to federated identity provider (or remain anonymous) (CUP, FB, G, OpenID, SAML, etc)
- Get temproary AWS credentials back from the Federated Identity Pool (Cognito)
- They come with a pre-defined IAM policy stating permissions
Example:
- Provide (temprorary) access to write to S3 bucket using FB login
Note: Web Identity Federation is an alternative to using Cofnito but AWS recommends against

Shared Responsbility Model

VPC

CIDR

Two components
- Base IP (xx.xx.xx.xx)
- Subnet mask (/32) (defines how many bits can change in an IP)
  - Can take two forms
    /24
    255.255.255.0 (less common)
  - /32 = 1 IP = 2^0
  - /31 = 2 IP = 2^1
  - /30 = 4 IP = 2^2
  - /29 = 8 IP = 2^3
  - /24 = 256 IP = 2^8
  - etc
  - /16 = 65536 = 2^16
  - /0 = all = 2^32
  - /32 - No IP number can change
  - /24 - Last .xIP number can change
  - /16 - Last x.xIP number can change
  - /8 - Last x.x.xIP number can change
  - /0 - All x.x.x.xIP numbers can change

Public vs Private

IANA via RFC 1918
Private IP can have the following values
- 10.0.0.0 - 10.255.255.255 (10.0.0.0/8)
- 172.16.0.0 - 172.31.255.255 (172.16.0.0/12) AWS default
- 192.168.0.0 - 192.168.255.255 (192.168.0.0/16)

VPC in AWS - IPv4

Can have multiple VPCs per region (5 soft limit)
Max CIDR per VPC is, following:
- Min size /28 = 16 IP
- Max size /16 = 65535 IP
Because VPC is private, only RFC1918 addresses
VPC CIDR should not overlap with your other networks

Subnets

AWS reserves 5 IPs (first 4 and last 1 of range) in each Subnet
They are not available for use
For CIDR 10.0.0.0/24:
- 10.0.0.0: Network address
- 10.0.0.1: Reserved by AWS for the VPC router
- 10.0.0.2: Reserved by AWS for mapping to Amazon provided DNS
- 10.0.0.3: Reserved for future use
- 10.0.0.255: Network broadcast (assume not available for exam)
Exam Tip: If you need 29 IP addresses for EC2 you can't choose a /27 because it's only 32 IPs, need a /26 (64IP)

Internet Gateway

Helps VPC internet connection
Scales horizontally, HA, and redundant
Must be created separately from VPC
One VPC per IGW, one IGW per VPC
IGW is also a NAT for the instances that have a public IPv4
Will not have internet access without Route Tables

NAT Instances (outdated)

Allow instances in the private subnet to connect to the internet
Must be launched in a public subnet
Must disable EC2: Source / Destination Check
Must have an Elastic IP (because route tables require fixed)
Route table must be configured to route trafcic from private subnets to NAT instance

Pre-configured Amazon Linux AMI are available
Not highly available or resilient setup by default
Would need to create an ASG in Multi AZ + resilient user-data script
Internet traffic bandwidth depends on EC2 instance performance
Must manage security ggroups & rules
- Inbound
  - Allow HTTP/S from private subnets
  - Allow SSH from hom network (through IGW)
- Outbound
  - Allow HTTP/S traffic to internet
  - Allow ICMP traffic to internet

NAT Gateway (new)

Only IPv4
AWS managed NAT, higher bandwidth, better availability, no admin
Pay by the hour for usage and bandwidth
NAT is created in a specfic AZ, uses EIP (can be in used Public Subnet)
Cannot be used by an instance in that subnet (only from other subnets)
Requires and IGW (Private subnet -> NAT -> IGW)
5 Gbps of bandwidth with auro-scaling up to 45Gbps
No security groups required

* Differences between the two

DNS Resolution in VPC

enableDnsSupport: (=Edit DNS Resolution Setting)
- Default True
- Decides if DNS resolution is supported for the VPC
- IfTrue, queries the AWS DNS server at 169.254.169.253
enableDnsHostname: (=Edit DNS Hostname setting)
- False by default for newly created VPC, True by default for Default VPC
- Won't do anything unless enableDnsSupport=True
- IfTrue, assign a public hostname to EC2 instances if it has a public IP
If you must use custom DNS domain names in a private zone in Route 53, you must have both as TRUE

NACL & Security Groups

NACL are like a firewall controlling traffic to and from subnet
Default NACL allows everything inbound and outbound
One NACL per Subnet, new Subnets are assigned the Default NACL
Define NACL rules:
- Rules have a number (1 - 32776) and LOWER number have precedence (once a number is matched it wins and ignores after)
- Last rule is an asterisk (*), and denies all in case of no match
- AWS recommends adding rules by increment of 100
Newly created NACL will deny everything
NACL are great way of blocking a specfic IP at the subnet level
Can be associated to multiple subnets
Rmemeber ephemeral ports

Inbound

SG is Stateful on outbound, will allow out an incoming request return even if outbound rules say not to (SG evaluates all rules before deciding)
NACL is Stateless on outbound, all rules are evaluated

Outbound

SG is Stateful on inbound, will allow in a returning request even if inbound rules say not to
NACL is Stateless on inbound, all rules are evaluated

VPC Endpoints

Endpoints allow you to connect to AWS services using a private network instead of the public internet
They scale horizontally and are redundant
They remove the need for IGQ, NAT, etc, to access AWS services
Interface: provisions and ENI (private IP) as an entry point (select subnets)(must attach security group) - for most AWS services
Gateway: provisions a target and must be used in a route table which is associated with subnets S3 and DynamoDB
- Needs region specified on the CLI because CLI default is us-east-1 with unspecified
In case of issues:
- Check DNS setting resolution in your VPC
- Check Route Tables

VPC Peering

Connect two VPC privately using AWS' network
Make them behave as if they were in the same network
Must not have overlapping CIDR
VPC Peering connection is not transitive (must be established for each VPC that needs to communicate with another)
Can do between accounts and regions
You must update route tables in each VPC's subnets to ensure instances can communicate

Flow Logs

Capture information about IP traffic going to your interfaces:
- VPC Flow Logs
- Subnet Flow Logs
- Elastic Network Interface (ENI) Flow Logs
For ACCEPT and REJECT traffic
Helps to monitor & troubleshoot connectivity issues
Flow logs data can go into S3 (Athena) / CloudWatch Logs (Insights)
Captures network information from AWS managed interfaces too: ELB, RDS, ElastiCache, Redshift, WorkSpaces

Flow Log Syntax

[version, accountid, interfaceid, srcaddr, dstaddr, srcport, dstport, protocol, packets, bytes, start, end, action, logstatus]
2 123456789010 eni-1235b8ca123456789 172.31.16.139 172.31.16.21 20641 22 6 20 4249 1418530010 1418530070 ACCEPT OK
Query VPC flow logs using Athena on S3 or CloudWatch Logs Insights

Bastion Hosts

Used to SSH into private instances
In the public subnet which is then connected to all private subnets
Bastion Host security must be tight
Exam tip: Make sure the bastion host only has port 22 from your ip, not even SG of your other instances

Site to Site VPN, Virtual Private Gateway, Customer Gateway

Virtual Private Gateway
- VPN concentrator on the AWS side of the VPN connection
- VGW is created and attached to the VPC from which you want to create the site-to-site VPN
- Possbility to customize the ASN
Customer Gateway
- Software application or physical device on customer side of the VPN connection
- IP Address
  - Use the static, internet routeable, IP address of your customer gateway device
  - If the CGW is behind a NAT (with NAT-T), use the public address of the NAT

Direct Connect

Provides a dedicated private connection from a remote network to your VPC
Dedicated connection must be setup between your DC and AWS Direct Connect locations
You need to set up a Virtual Private Gateway on your VPC
Access public resources (S#) and private (EC2) on the same connection
Use cases:
- Increase bandwidth throughput - working with large data sets - lower cost
- More consistent network experience - application using real-time data feeds
- Hybrid Environments
Supports both IPv4 and IPv6

Direct Connect Gateway

If you want to set up a Direct Connect to one or more VPC in many different regions (no overlapping IPs)

Egress only IGW

Egress only IGW is for IPv6 only
Similar function as a NAT (GW), but a NAT is for IPv4
All IPv6 are public addresses
Therefore all instances are publicly accessible
Egress Only Internet Gatway gives out IPv6 instances access to the internet, but not reachable publicly
After creating an Egress Only IGW edit the Route Tables

VPC Summary

Other Services

CI/CD

Code - CodeCommit, Build - CodeBuild, Test - CodeBuild, Deploy - Elastic Beanstalk or CodeDeploy -> EC2 Fleet, Provision
CodePipeline ORchestrates it all
When deploying code directly onto EC2 instances or On Premise servers, CodeDeploy is the service to use. You can define the strategy (how fast the rollout of the new code should be)

Infrastructure as Code

CloudFormation - Declarative way of outlining Infrastructure (does ordering and orchestration for you)
- Manual way: Edit templates in designer, use console to input parameters
- Automated way: Edit YAML file, use CLI to deploy (recommended)
Template Components
- Resources: Resources declared in template (mandatory)
- Parameters: The dynamid inputs for your template
- Mappings: Static variables for template
- Outputs: References to what has been created
- Conditionals: List of conditions to perform resource creation
- Metadata
- Template Helpers
  - References
  - Functions

ECS

Container orchestration service
Made of:
- Core, running ECS on user-provisioned EC2 instances
- Fargate: serverless
- EKS: K8s on managed EC2
- ECR: Registry
ECS
- ECS Cluster: set of EC2 instances
- ECS Service: Application definitions running on Cluster
- ECS Tasks + definition: The containers running to create the the application
- ECS IAM roles: Roles assigned to tasks to interact with AWS
- ALB integration has direct integration with ECS called port mapping
  - Run multiple instances of the same application on the same machine
    Increased resiliency even if running on one EC2 instance
    Maximize CPU/Core utilization
    Ability ot perform rolling upgrades without impacting application
- ECS Setup and config file
  - Run an EC2 instance, install the ECS agent with ECS config file
  - Or use ECS ready Linux AMI (and smodify the config file)
  - Config file is at: /etc/ecs/ecs.config

ECR Registry
- Store, manage, deploy your containers
- Fully integrated with IAM & ECS
- Sent over HTTPS, and encrypted at rest

Step Functions

Build Serverless visual workflow to orchestrate your Lambda functions
Represent flow as a JSON state machine, outputs a visual workflow graph, can see steps succeed / in progress / fail etc
Features: sequence, parallel, conditions, timeouts, error handling...
Maximum execution time of 1 year
Can implement human approval feature
Use cases: Order fulfillment, data processing, etc

SWF - Simple Workflow Service (older)

Coordinate work amongst applications (not serverless)
**Step Functions is recommended for all new apps, except:
- If you need external signals to intervene in the process
- If you need child processes that return values to parent process.**

AWS Glue

Fully managed ETL service
Move from data sources, transform, clean, change format and put somewhere
Automate time consuming steps of data preparation for analytics
Provisions Apache Spark
Crawls data sources and identifies data formats (schema inference)
Automated Code Generation to customize Spark code
Sources: Aurora, RDS, Redshift, & S3 (crawls tables etc and discovers all)
Sinks: S3, Redshift, etc
Glue Data Catalog: Metadata (definition & schema) of the Source Tables (to later use in your EMR)

Opsworks

Opsworks = Managed Chef & Puppet
Alternative to AWS SSM
Configuration as code

Elastic Transcoder

Convert media files (video & music) stored in S3 to various formats
Features: bit rate optmization, thumbnail, watermarks, captions, DRM, rpgoressive download, encryption
Components:
- Jobs: what does the actual work
- Pipeline: Queue that manages the transcoding job
- Presets: Template for converting media from one format to another
- NOtifications: SNS for example
Pay for waht you use, fully managed

AWS Organizations

Global service
One master account - can't change it
Other accounts are member accounts, which can only be part of one org
Consolidated billing across all accounts
Pricing benefits from aggregated usage
API is available to automate account creation

Organize accounts in Organizational Units (OU)
- Can be anything dev, test, prod, or hr, finance, IT
- Can nest OU within OU
Apply Service Control Policies (SCPs) to OU
- Permit / Deny access to AWS services
- SCP has a similar syntax to IAM
- It's a filter to IAM
Helpful for sandbox account creation
Helpful to separate dev and prod resources
Helpful to only allow approved services

AWS WorkSpaces

On demand Managed, Secure Cloud Desktop
Elimite on-prem VDI
Secure, encrypted, network isolation
Integrates with AD
Windows and Linux

AppSync

Store and sync data across mobile and web-apps in real-time
MAkes use of GraphQL (from facebook)
Integrates with DynamoDB / Lambda
Offline data synchronization (alternative to Cognito, exam)

AWS Single Sign On

Centrally managed SSO across multiple AWS account, Business Applciations (O365, Salesforce, Box, etc)
One login gets you access to everything securely
Integrated with MS AD
Reduces process of setting up SSO in a company
Only helpful for Web Browser, SAML 2.0 enabled applications

Here's a quick cheat-sheet to remember all these services:

CodeCommit: service where you can store your code. Similar service is GitHub

CodeBuild: build and testing service in your CICD pipelines

CodeDeploy: deploy the packaged code onto EC2 and AWS Lambda

CodePipeline: orchestrate the actions of your CICD pipelines (build stages, manual approvals, many deploys, etc)

CloudFormation: Infrastructure as Code for AWS. Declarative way to manage, create and update resources.

ECS (Elastic Container Service): Docker container management system on AWS. Helps with creating micro-services.

ECR (Elastic Container Registry): Docker images repository on AWS. Docker Images can be pushed and pulled from there

Step Functions: Orchestrate / Coordinate Lambda functions and ECS containers into a workflow

SWF (Simple Workflow Service): Old way of orchestrating a big workflow.

EMR (Elastic Map Reduce): Big Data / Hadoop / Spark clusters on AWS, deployed on EC2 for you

Glue: ETL (Extract Transform Load) service on AWS

OpsWorks: managed Chef & Puppet on AWS

ElasticTranscoder: managed media (video, music) converter service into various optimized formats

Organizations: hierarchy and centralized management of multiple AWS accounts

Workspaces: Virtual Desktop on Demand in the Cloud. Replaces traditional on-premise VDI infrastructure

AppSync: GraphQL as a service on AWS

SSO (Single Sign On): One login managed by AWS to log in to various business SAML 2.0-compatible applications (office 365 etc)

Whitepapers

Well Architected Framework + Tool

General Guiding Principles
- Stop guessing capacity needs
- Test systems at production scale
- Automate to make architectural experimentation easier
- Allow for evolutionary architectures
  - Design based on changing requirements
- Drive architecture changes using data
- Improve through game days
  - Simulate applications for flash sale days

5 Pillars

Operational Excellence
- The ability to run and monitor systems to deliver business value and to continually improve supporting processes and procedures
- Design Principles
  - Perform annotations as code - Infrastructure as code
  - Annnotate Documentation - Automate the creation of annotated documentation after every build
  - Make frequent, small, reversible changes
  - Refine operations procedures frequently - And ensure that team members are familiar with it
  - Aniticpate failure
  - Learn from all operation failures
- Prepare
  - CloudFormation, AWS Config
- Operate
  - CloudFormation, AWS Config, CloudTrail, CloudWatch, X-Ray
- Evolve
  - CloudFormation, CodeBuild, CodeCommit, CodeDeploy, CodePipeline
Security
- Includes the ability to protect information, systems, and assets while delivering business value through risk assessments and mitigation strategies
- Design Principles
- Implement a strong identity foundation - Centralize privilege management and reduce (or even eliminate) reliance on long term credentials - Principle of Least Privilege - IAM
- Enable traceability - Integrate logs and metrics with systems to automtaically respond and take action
- Apply security at all layers - Edge Network, VPC, Subnet, Load balancer, each instance, OS, and application
- Automate Security best practices
- Protect data in tansit and at rest - Encryption, tokenization, and access control
- Keep people away from data - No direct or manual access
- *Prepare for security events - Run incident response, simulations and use tools with automation to increase your speed for detection, investigation, and recovery
- IAM
  - IAM, AWS-STS, MFA token, Organizations
- Detective Controls
  - Config, CloudTrail, CloudWatch
- Infrastructure Protection
  - CloudFront, VPC, Shield, WAF, Inspector
- Data Protection
  - KMS, S3, ELB, EBS, RDS
- Incident Response
  - IAM, CloudFormation, CloudWatch Events
Reliability
- Ability of a system to recover from infrastructure or service disruptions, dynamically acquire computing resources to meet demand, and mitigate disruptions suchj as misconfigurations or transient network issues
- Design Principles
  - Test recovery scenarios - Use automation to simulate different failures or to recreate scenarios that led to failures before
  - Automatically recover from failure - Anticipate and remediate failure before they occur
  - Scale horizontally to increase aggregate system availability - Distribute requests across multiple, smaller resources to ensure that they don't share a common point of failure
  - Stop guessing capacity - Maintain the optimal level to satisfy demand without over or under provisioning
  - Manage change via automation
- Foundations
  - IAM, VPC, Service Limits, Trusted Advisor
- Change management
  - Autoscaling, CloudWatch, CloudTrail, Config
- Failure Management
  - Backups, CloudFormation, S3, S3 Glacier, Route 53
Performance Efficiency
- Includes the ability to use computing resources efficiently to meet system requirements and to maintain that efficiency as demand change and techonologies evolve
- Design Principles
  - Democratize advanced technologies - Advanced technologies become services and hence you can focus more on product development
  - Go global in minutes - Easy deployment in multiple regions
  - Use serverless archtiectures - Avoid burden of managing servers
  - Experiment more often - Easy to carry out comparative testing
  - Mechanical sympathy - Be aware of all AWS services
- Selection
  - Auto-Scaling, Lambda, EBS, S3, RDS
- Review
  - CloudFormation
- Monitoring
  - CloudWatch, Lambda
- Tradeoffs
  - RDS, Elasticache, Snowball, Cloudfront (all have tradeoffs vs other solutions)
Cost Optimization
- includes the ability to run systems to deliver business alue at lowest price point
- Design Principles
  - Adopt a consumption model - Pay only for what you uuse
  - Measure overall efficiency - Use CloudWatch
  - Stop spending money on data center operations - AWS does the infrastrcuture part and enables customer to focus on organization projects
  - Analyze and attribute expenditure - Accurate identification of system usage and costs, helps measure return on investment. USE TAGS
  - Use managed and application level services to reduce cost of ownership - As a managed services operate at cloud scal, they can offer a lower cost per transacation or service
- Expenditure Awareness
  - Budgets, Cost and Usage reports, Cost Explorer, Reserved Instance Reporting
- Cost-effective resources
  - Spot instance, Reserved instances, Glacier
- Matching supply and demand
  - Auto-Scaling, Lambda
- Optimizing Over Time
  - Trusted Advisor, Cost and usage reports
Not tradeoffs, they're a synergy

Well Architected Tool

Define workload, track over time
Milestones, improvement plans, Risks

Trusted Advisor

Cost optimization, Performance, Security, Fault Tolerance, Service Limits
Get upgraded recommendations, more than for governance
Some paid
Can get weekly emails to different contact groups

Disaster Recovery

Any event that has a negative impact on a company's business continuity or finances is a disaster
DR is about preparing for and recovering from a disaster
What kind of DR?
- On-Premisea -> On-preimse (traditional, $$$$)
- On-Premises -> AWS Cloud (hybrid recovery)
- AWS Cloud Region A -> AWS Cloud Region B

Strategies

Backup and restore (Longest RTO, high RPO, not too expensive)
Pilot Light (2nd longest RTO, Small version of the app is always running in the cloud, similar to backup restore but critical core up )
Warm Standby (3rd longest RTO, full system up and running but at minimum size, scale to production load)
Multi-Site (Shortest RTO, full prod at second site)
But all get increasingly more expensive

PreviousNotes NextEverything

Last updated 3 years ago

hashtagExam Tips

hashtagGeneral Architecture

hashtagRegions and AZs

hashtagService Models

hashtagHigh Availability (HA) vs. Fault Tolerance

hashtagRPO vs. RTO

hashtagScaling

hashtagTired Application Design

hashtagMisc.

hashtagAccounts

hashtagBudgets and Cost

hashtagSolution Architecture

hashtagInstantiating instances quickly

hashtagElastic Beanstalk

hashtagWell-Architected Framework (WAF)

hashtagSecurity

hashtagIAM

hashtagAuthentication and Authorization

hashtagUsers

hashtagGroups

hashtagRoles

hashtagPolicies

hashtagCompute

hashtagEC2

hashtagEC2 Instance Metadata

hashtagStorage

hashtagS3

hashtagS3 MFA Delete

hashtagS3 Access Logs

hashtagS3 Cross Region Replication

hashtagS3 Pre-signed URLs

hashtagS3 Storage Tiers

hashtagS3 Lifecycle Rules

hashtagSnowball

hashtagAWS Snowmobile

hashtagStorage Gateway

hashtagEBS

hashtagEBS Snapshots

hashtagEBS Migration

hashtagEBS Encryption

hashtagEBS RAID

hashtagEFS

hashtagInstance store

hashtagNetworking

hashtagLoad Balancing

hashtagLB Stickiness, enabled in Target Groups

hashtagLB SSL Certificates

hashtagAuto-Scaling Groups (ASG)

hashtagSecurity Groups

hashtagDatabases

hashtagRDS

hashtagRDS Read Replicas for scalability

hashtagRDS Multi AZ (Disaster Recovery)

hashtagRDS Backups

hashtagRDS Encryption

hashtagRDS Security

hashtagRDS vs. Aurora

hashtagAurora

hashtagElasticache

hashtagDynamoDB

hashtagAthena

hashtagRoute 53

hashtagRoute 53 Geolocation Policy

hashtagMulti Value Routing Policy

hashtagRoute 53 Health Checks

hashtagRoute 53 as a Registrar

hashtagDeveloping on AWS

hashtagCLI

hashtagRoles

hashtagPolicies

hashtagAWS SDK

hashtagCloudFormation

hashtagCloudFront

hashtagCloudFront Signed URL / Signed Cookies

hashtagCloudFront vs S3 Cross Region Replication

hashtagCloudFront Geo Restriction

hashtagMessaging

hashtagGeneral

hashtagSQS (Super important)

hashtagSNS

Exam Tips

General Architecture

Regions and AZs

Service Models

High Availability (HA) vs. Fault Tolerance

RPO vs. RTO

Scaling

Tired Application Design

Misc.

Accounts

Budgets and Cost

Solution Architecture

Instantiating instances quickly

Elastic Beanstalk

Well-Architected Framework (WAF)

Security

IAM

Authentication and Authorization

Users

Groups

Roles

Policies

Compute

EC2

EC2 Instance Metadata

Storage

S3

S3 MFA Delete

S3 Access Logs

S3 Cross Region Replication

S3 Pre-signed URLs

S3 Storage Tiers

S3 Lifecycle Rules

Snowball

AWS Snowmobile

Storage Gateway

EBS

EBS Snapshots

EBS Migration

EBS Encryption

EBS RAID

EFS

Instance store

Networking

Load Balancing

LB Stickiness, enabled in Target Groups

LB SSL Certificates

Auto-Scaling Groups (ASG)

Security Groups

Databases

RDS

RDS Read Replicas for scalability

RDS Multi AZ (Disaster Recovery)

RDS Backups

RDS Encryption

RDS Security

RDS vs. Aurora

Aurora

Elasticache

DynamoDB

Athena

Route 53

Route 53 Geolocation Policy

Multi Value Routing Policy

Route 53 Health Checks

Route 53 as a Registrar

Developing on AWS

CLI

Roles

Policies

AWS SDK

CloudFormation

CloudFront

CloudFront Signed URL / Signed Cookies

CloudFront vs S3 Cross Region Replication

CloudFront Geo Restriction

Messaging

General

SQS (Super important)

SNS