Solutions Architect Associate - Study Notes

Exam Tips

  • Associate everything learned to a WAF pillar

  • If a solution seems feasibale but highly complicated it's probably wrong

  • Don't overthink it

  • 50% off next exam if pass

General Architecture

Regions and AZs

  • Region us-east-1

  • AZ us-east-1a-f

  • Consoles are region scoped (aside IAM, S3, and Route 53)

  • Global Infrastructure to see AZ # and definitions

Service Models

  • IaaS, PaaS, SaaS, FaaS (Function as a Service)

The service stack responsibility differs depending on the service model (Data centre, Network and Storage, Host/Servers, Virtualization, OS, Runtime, Application, Data)

High Availability (HA) vs. Fault Tolerance

  • HA - Hardware, Software, and configuration allowing a system to recover quickly in the event of a failure (minimize downtime, not to prevent the failure to begin with)

  • Fault Tolerance - System designed to operate through a failure with no user impact.

RPO vs. RTO

  • RPO - How much a business can tolerate to lose, expressed in time between failure and backup.

  • RTO - Maximum time a system can be down, time to recover.

Scaling

  • Vertical Scaling - Increase size of server, maximum machine sizes will constrain (technically or cost wise)

  • Horizontal Scaling - Additional machines into a pool of resources, requires application support.

Tired Application Design

  • Presentation - interacts with customer

  • Logic - delivers application functionality

  • Data - data storage and retrieval

  • Monolithic application require vertical scaling

Misc.

  • Cost efficient or cost effective - Implementing for as little initial and ongoing cost

  • Application Session State - represents what a customer is doing, have chosen, or configured.

  • Undifferentiated Heavy Lifting - A part of an application, system, or platform that is not specific to your business.

Accounts

Budgets and Cost


Solution Architecture

Instantiating instances quickly

  • Golden AMI: Apps, dependencies, etc. done beforehand

  • User Data: For dynamic configuration (retrieving un/pw or something)

  • Hybrid: mix Golden and User Data (Elastic Beanstalk)

  • RDS: Restore from snapshot, DB will have schemas and data ready

  • EBS Volumes: restore from snapshot, will already be formatted and have data

Elastic Beanstalk

  • Single Instance deployment: Good for dev

  • LB + ASG: good for prod, pre-prod

  • ASG only: Good for non-web apps in productions (workers etc.)

  • Three components

    • Application

    • Application version

    • Environment name

  • Can promote versions to next env

  • Rollback feature to previous version

  • Full control over lifecycle of envs

  • Support for most platforms (can write own custom platform too)


Well-Architected Framework (WAF)

  • Read WAF whitepaper

  • Re-read WAF notes from internal training

  • When going through course align everything with a WAF pillar

  • Pillars, Design Principles, Questions


Security

IAM

  • Global across all Regions

  • Account Aliases must be globally unique

Authentication and Authorization

  • Principal - Person or application that can make an authenticated or anonymous request to perform an action on a system

  • Authentication - Process of authenticating a principal against an identity

  • Identity - Objects that require authentication and are authorized to access resources

  • Authorization - Process of checking and allowing or denying access to a resource for an identity

Users

  • One user per physical person

  • chmod 0400 on .pem key file

    • (Windows 10 SSH) Properties - > Security - > (make self owner) - > remove Inheritance - > remove all other users - > ensure Full Control

Groups

Roles

  • Internal use, machine use only?

  • One role per application, no sharing

Policies

  • Written in [JSON]


Compute

EC2

  • Exam Tips

    • Billed by the second

    • Windows 10 can use SSH

    • SG can have IPs as rules, but also reference other SG for rules

  • Instance

    • Has public IP by default, likely change on restart

  • User Data

    • Commands automatically run with sudo

    • Runs as root

    • Runs first time system is run only

    • Gets base64 encoded and passed

  • AMI

    • Region specific (but can copy)

    • Cross account AMI copy

      • You can share an AMI with another AWS account

      • Sharing an AMI does not affect ownership of the AMI

      • If you copy an AMI that has been shared with your account, you are the owner of the target AMI in your account

      • To copy an AMI that was shared from another account the source owner must grant you read permissions for the storage that backs the AMI (EBS snapshot or S3 bucket for instance store backed)

      • Limits:

        • Can't copy encrypted shared AMI. If the underlying snapshot and encryption key were shared you can copy while re-encrypting it with own key. You own the copied snapshot and register it as new AMI.

        • Can't copy a shared AMI with an associated billingProduct code, including Windows and Marketplace AMIs. To copy launch an EC2 instance using the shared AMI then create an AMI from the instance.

    • Reside in S3 (cost based on storage used)

    • Use custom AMI for faster deploy in ASG

  • EC2 Instance Launch Types

    • On Demand Instances

      • For: Short-term uninterruptable workloads when cannot predict application behaviour

      • Pay per use, billing per second after first minute

      • Highest cost, no upfront payment or commitment

    • Reserved Instances

      • For: Steady state usage (think database)

      • Up to 75% discount vs OD

      • Pay upfront for use, long term commitment, 1 or 3 years

      • Reserve specific instance type

      • Convertible Reserved Instance

        • Can change EC2 instance type

        • Up to 54% discount

      • Scheduled Reserved Instance

        • Launch within the time window you reserve (at regular interval)

    • Spot Instances

      • For: Batch jobs, Big Data analysis, failure resilient workloads

      • Discount up to 90% vs OD

      • Active as long as under bid price

      • Price varies on supply and demand

      • Reclaimed with 2 min warning when spot price goes above bid

    • Dedicated Instances

      • Hardware dedicated to you

      • May share hardware with other instances in same account that are not Dedicated Instances

      • No control over instance placement

    • ![Screen Shot 2019-10-27 at 15.03.37.png](../../../../_resources/Screen Shot 2019-10-27 at 15.03.37.png)

  • Instance Types

    • R: RAM - ex: in-memory cache

    • C: CPU - ex: compute/database

    • M: Balanced (Medium)- ex: general/web app

    • I: I/O (instance storage) - ex: databases

    • G: GPU - ex: video rendering or machine learning

    • Burstable (T2/T3)

  • Placement Groups

    • Cluster - Low latency, single AZ

      • Same rack, same AZ, 10GB Network, same failure zone

    • Spread - Spreads across underlying hardware, and across AZs (max 7 instances per group, per AZ)(critical applications, maximum HA)

    • Partition - Spreads across many partitions (which rely on different racks)within an AZ. Scales to 100's of instance per group (ex: Hadoop, Cassandra, Kafka)

      • Partition is a set of racks, can create up to 7 partitions in PG

      • Each partition has many instances, partition is same failure zone

      • Partition failure will not affect other

      • EC2 instances can get access to partition metadata

EC2 Instance Metadata

  • Ability to learn about one's self without using an IAM role

  • URL is http://169.254.169.254/latest/meta-data

  • Can retrieve IAM Role name from metadata, but not the IAM Policy

  • When querying curl http://169.254.169.254/latest/iam/security-credentials/myfirstrole

    • Get AcessKeyID, Secret, and Token, which is what the EC2 instance gets via the IAM Role to access whatever

    • Short lived


Storage

S3

  • Bucket names must be globally unique

    • Global at top menu, (but regional service)

  • Minimum of 3 and maximum of 63 characters - no uppercase or underscores

  • Must start with a lowercase letter or number and can’t be formatted as an IP address (1.1.1.1)

  • Default of 100 buckets per account, and hard 1000 bucket limit via support request

  • Unlimited objects in buckets

  • Unlimited total capacity for a bucket

  • An object’s key is its name (FULL PATH including slashes and filename, but not bucket name)

  • An object’s value is its data (content)

  • An object’s size is from 0kb to 5TB (more than 5GB must use multi-part upload)

    • To upload a file larger than 160 GB, use the AWS CLI, AWS SDK, or Amazon S3 REST API.

  • Metadata (list of key/value pairs, system or user metadata)

  • Tags (Unicode key/value pair -max 10-), useful for security / lifecycle

  • Version ID (if versioning is enabled)

Versioning

  • Bucket level setting

  • If you overwrite a key/file you increment its version

  • Best practice to version your buckets

    • Protect against unintended deletes

    • Easy roll back to previous version

  • Any file that is not versioned prior to enabling versioning will have a version NULL

  • Deleting a file only adds a delete marker

S3 Websites

  • URL can be

    • .s3-website-.amazonaws.com

    • .s3-website..amazonaws.com

S3 CORS

  • If you request data from another S3 bucket you need to enable CORS

  • Cross Origin Resource Sharing allows you to limit the number of websites that can request files in your S3 (help limit costs)

  • Access-Control-Allow-Origin:

S3 Consistency Model

  • Amazon S3 provides read-after-write consistency for PUTS of new objects in your S3 bucket in all regions with one caveat. The caveat is that if you make a HEAD or GET request to the key name (to find if the object exists) before creating the object, Amazon S3 provides eventual consistency for read-after-write.

S3 Security

  • User based

    • IAM Policies - which API calls should be allowed for a specific user from IAM

  • Resource Based

    • Bucket Policies - bucket wide rules from the S3 console - allows cross account

    • Object ACLs - finer grain, not super popular

    • Bucket ACLs - less common

S3 Bucket Policies

  • Grant public access to the bucket

  • Force objects to be encrypted at upload

  • Grant access to another account (Cross account)

  • JSON based (4 components)

    • Resources: buckets and objects

    • Actions: Set of APIs to Allow or Deny

    • Effect: Allow or Deny

    • Principal: The account of user to apply the policy to

  • Networking: Supports VPC endpoints (for instances in VPC with no internet)

  • Logging and Auditing: S3 access logs can be stored in another bucket, API calls can be logged in CloudTrail

  • User Security: MFA can be required in versioned buckets to delete objects, Signed URLs = valid for a limited time (ex: premium video service for time)

S3 Encryption for Objects

Can also set default encryption for bucket

SSE-S3

  • Keys handled and managed by AWS S3

  • Object is encrypted server side, sent via HTTP/S

  • AES-256

  • Must set header: "x-amz-server-side-encryption":"AES256"

  • S3 Managed Data Key + Object > Encrypted

SSE-KMS

  • Keys handled and managed by KMS

  • Object is encrypted server side, sent via HTTP/S

  • KMS advantages: user control (rotation etc.) + audit trail

  • Must set header: "x-amz-server-side-encryption":"aws:kms"

  • KMS Customer Master Key (CMS) + Object > Encrypted

SSE-C

  • Server Side encryption using keys fully managed by customer outside AWS

  • S3 does not store the key

  • HTTPS must be used

  • Encryption key is provided (sent) in HTTP header, in every request

  • Client provided data key + Object > Encrypted, S3 throws away key

Client Side Encryption

  • Client library such as Amazon S3 Encryption Client

  • Clients must encrypt data themselves before sending to S3

  • Client must decrypt data themselves when retrieving from S3

  • Customer fully manages the keys and encryption cycle

Encryption in Transit

  • AWS S3 exposes both HTTP and HTTPS endpoints, HTTPS recommended

Default Encryption vs Bucket Policies

  • Old way was to use bucket policies to enable and to refuse any HTTP command without proper headers

  • New way is to click "default encryption" option in S3

  • Bucket Policies are evaluated before default encryption

  • Either SSE-S3 (AES-256) or SSE-KMS

S3 MFA Delete

  • To use MFA-Delete must enable Versioning on the S3 bucket

  • You need MFA to

    • permanently delete an object version

    • suspend versioning on the bucket

  • You won't need it for

    • enabling versioning

    • listing deleted versions

  • Only bucket owner (root account) can enable/disable MFA-delete

  • Can only be enabled using the CLI

S3 Access Logs

  • Any request made to S3 from any account, authorized or denied, will be logged to another S3 bucket

  • Can analyze using data analysis tools (Hive, Athena, etc.)

  • Log format in docs

S3 Cross Region Replication

  • Must enable versioning (source and destination)

  • Must be in different regions (duh)

  • Can be different accounts

  • Copying is asynchronous

  • Must give proper IAM permissions to S3, needs Role

  • For:

    • Compliance, lower latency access, cross account replication

  • Can do based on whole bucket, prefix, tags

  • Can replicate encrypted if other account has access to KMS key

  • Can change storage class or ownership

S3 Pre-signed URLs

  • Can create a pre-signed URL via CLI or SDK

    • For downloads CLI

    • For uploads SDK

  • Valid by default for 3600 seconds, change with --expires-in [TIME_BY_SECONDS]

  • Users who receive pre-signed URL inherit permissions of the generator for GET/PUT

  • aws configure set default.s3.signature_version s3v4

  • aws s3 presign s3://bucketname/file.jpg --expires-in 300 --region ca-central-1

  • Avoids direct access to the bucket from users

S3 Storage Tiers

  • S3 Standard - General Purpose

  • 99.999999999% Durability (10 mil objects 10k years, lose 1)

  • 99.99% availability

  • Can sustain 2 concurrent AZ loss

  • S3 Reduced Redundancy Storage (RRS)

    • Deprecated

    • 99.99% durability and availability

    • Can sustain loss of single AZ

    • Use for non-critical reproducible data

  • S3 Standard Infrequent Access (IA)

    • Suitable for data less frequently access but requires rapid retrieval

    • Retrieval fee

    • 99.999999999% Durability (10 mil objects 10k years, lose 1)

    • 99.99% availability

    • Can sustain 2 concurrent AZ loss

    • For backups, DR, etc.

  • S3 One Zone Infrequent Access

    • Same as IA, but data is stored in a single AZ

    • Retrieval fee

    • 99.999999999% Durability; data is lost when AZ is destroyed

    • 99.95% availability

    • Lower cost by 20% than IA

    • For secondary backup data, or recreatable

  • S3 Intelligent Tiering

    • Small monthly auto-tiering fee

    • Move between S3 and IA based on access patterns

    • 99.999999999% Durability, 99.9% availability

    • Can sustain single AZ loss

  • S3 Glacier

    • Alternative to Tape (10's of years)

    • 99.999999999% Durability

    • Cost per estorage per month ($0.004 / GB) + Retrieval fee

    • Each item is called an "Archive", up to 40TB size

    • ARchives are stored in "Vaults", similar to bucket

    • Retrieval options:

      • Expedited (1-5 mins) - $0.03 / GB and $0.01 per request

      • Standard (3-5 hours) - $0.01 per GB and 0.05 per 1000 requests

      • Bulk (5-12 hours) - $0.0025 per GB and $0.025 per 1000 requests

91c06d5ee5b59e78e539dce4704aacab.png

S3 Lifecycle Rules

  • Transition Actions: Defines when objects are transitioned to another storage class

  • Expiration Actions: Objects expire and are deleted

  • Can be used to delete incomplete multi-part uploads

  • Limit to prefix or tag

  • Can do current or previous versions

Snowball

  • Physically transport data in or out of AWS

  • TB or PB

  • Alternative to network fees

  • Secure, tamper resistant, uses KMS 256

  • Tracking using SNS and text messages, E-Ink shipping label

  • For: large data migrations, DC decommission, disaster recovery

  • If it takes more than a week via network use Snowball instead

  • Has client for copying files

Snowball Edge

  • Adds computational capability

  • 100TB capacity, either:

    • Storage Optimized - 24 vCPU

    • Compute Optimized - 52 vCPU & optional GPU

    • Supports a custom EC2 AMI so you can process while transferring

    • Supports custom Lambda functions

AWS Snowmobile

  • Transfer exabytes (1EB = 1000PB = 1000000TB)

  • Each has 100PB of capacity, can use multiple in parallel

  • Use if transferring more than 10PB

Storage Gateway

  • Expose S3 on-premises

  • File Gateway

    • S3 buckets via NFS and SMB (all S3 modes)

    • Bucket access using IAM roles for each File Gateway

    • Recently used data is cached

    • Can be mounted on many servers

  • Volume Gateway

    • Block storage using iSCSI backed by S3

    • ^ Backed by EBS snapshots

    • Cached volumes: low latency access to most recent data

    • Stored volumes: entire dataset is on-premises, scheduled backups to S3

  • Tape Gateway

    • VTL Virtual Tape Library backed by S3 and Glacier

    • Back up data using existing tape based processes (and iSCSI interface)

    • Works with most backup softwares

EBS

  • EBS volumes are AZ locked

  • Can migrate via snapshot and recreate

  • EBS backups use IO and shouldn't run during peak times

  • Root instances of EBS volumes get terminated with instance by default (can disable)

  • Disk IO is high - Increase EBS volume size (for gp2)

  • Size | Throughput | IOPS

  • GP2 (SSD): General purpose SSD (balance price/perf)

    • Boot volumes, virtual desktops, low-latency interactive apps, development and test

    • 1GB-16TB

    • Small GP2 can burst IOPS to 3000 (anything under 3k can burst to 3k)

    • Max IOPS is 16000

    • 3 IOPS per GB, means at 5334 GB at max IOPS

  • IO1 (SSD): Highest-perf, low latency or high-throughput

    • Critical business apps that require sustained IOPS, or more than 16000

    • Mongo, Cassandra, MSSQL, MySQL, Oracle

    • 4GB-16TG

    • IOPS is provisioned 100-64000 (64k for Nitro only) else 100-32000

    • Maximum ratio of provisioned IOPS to volume GB size = 50:1

  • ST1 (HDD): Low cost for frequently accessed, throughput-intensive workloads (big data)

    • Streaming workloads requiring consistent, fast throughput at low price

    • Big Data, DW, log processing, Kafka

    • Cannot be boot volume

    • 500GB - 16TB

    • Max IOPS is 500

    • Max throughput of 500 MB/s, can burst

  • SCI (HDD): Lowest cost for less frequently accessed workloads

    • Throughput oriented for large volumes of data infrequently accessed

    • Where lowest cost is important

    • Cannot be a boot volume

    • 500Gb - 16TG

    • Max IOPS is 250

    • Max throughput of 250 MB/s, can burst

  • Only GP2 and IO1 can be boot volumes

  • EC2 machine loses its root volume when terminated

  • Store non-ephemeral data on EBS volume, network drive (not physical) you can attach or detach while running. Persist data.

  • Locked to AZ

    • Can move via snapshot

  • Have a provisioned capacity (billed for all capacity)

  • Can dynamically increase capacity over time, start small

EBS Snapshots

  • Incremental - only changed blocks

  • EBS backups use IO, should not run them during peak times

  • Snapshots are stored in S3 (but you won't see them)

  • Don't have to detach volume but recommended

  • Max 100000 snapshots

  • Can copy across AZ or Region

  • Can make AMI from Snapshot

  • EBS volumes restored by snapshots need to be pre-warmed (using fio or dd to read entire volume)

  • Can be automated using Amazon Data Lifecycle Manager

EBS Migration

  • Volumes locked to AZ

  • To migrate, snapshot, (optional) copy volume to different region

  • Create a volume from the snapshot in the AZ of your choice

EBS Encryption

  • When you encrypt an EBS volume you get:

    • Data at rest is encrypted inside the volume

    • Data in flight between instance and the volume is encrypted

    • Snapshots are encrypted

    • As are volumes created from the snapshot

  • Encryption and decryption are transparent

  • Minimal impact on latency

  • EBS Encryption leverages keys from KMS (AES-256)

  • Copying an unencrypted snapshot allows encryption

  • Snapshots of encrypted volumes are encrypted

  • Encrypting an undecrypted EBS volume

    • Create an EBS snapshot of the volume

    • Encrypt the snapshot using copy

    • Create a new volume from the snapshot

    • Attach encrypted volume to original instance

EBS RAID

  • EBS is already redundant (replicated within an AZ)

  • But for increase of IOPS past max

  • Must do in OS not AWS

  • Or mirror EBS volumes

    • RAID 0 (Perf, get combined disk space, IO, throughput, not fault tolerant)

    • RAID 1 (mirror, send data to two* volumes at same time, 2x network traffic)

    • RAID 5, 6 (Not recommended for EBS)

EFS

  • Managed NFS

  • EFS works with EC2 instances multi-AZ

  • Highly available, scalable, expensive (3xGP2), pay per use

  • For: content management, web serving, data sharing, WordPress

  • NFS v4.1

  • Use security groups to control access

  • Compatible with Linux based AMI (not windows)

  • Performance mode: General purpose (default), Max IO (used when 1000's of EC2 are using the EFS)

  • Has bursting or provisioned modes for IO

  • "EFS file sync" to sync from on-prem fs to EFS

  • Backup EFS-to-EFS (incremental, can choose frequency)

  • Encryption at rest using KMS

  • EFS now has lifecycle mgmt. to tier to EFS IA

Instance store

  • Some instance do not come with root EBS

  • Ephemeral

  • Physically attached to your instance

  • Pros

    • Better I/O perf

    • Good for buffer / cache / scratch data / temporary content

    • Data survives reboot

  • On stop or termination instance store is lost

  • Can't resize the instance store

  • Backups must be operated by the user


Networking

  • Elastic IP public static IPv4 attachable to one instance

  • Horizontal scalability = elasticity

  • Vertical scalability (RDS, Elasticache)

  • HA means running your application in 2 DC/AZ

Load Balancing

  • Health Checks

    • Done on port and route

  • Any LB has a static hostname, use it and not IP

  • LB can scale, not instant, contact AWS for a warm-up

  • 4xx errors are client induced errors

  • 5xx errors are application induced errors

  • LB 503 errors means at capacity or no registered target

  • If LB can't connect to app, check SG!

  • Seamlessly handle failures of downstream instances

  • Health checks (clb? 200 ok, otherwise not)

  • CLB + ALB support SSL Certificates and provide SSL termination for websites (NLB can terminate, Jan 2019)

  • Enforce stickiness

  • HA across AZs

  • Separate public traffic from private traffic

  • Exposes single point of access (DNS) to your app

  • Network Load Balancers expose a public static IP, whereas an Application or Classic Load Balancer exposes a static DNS (URL)

  • ELB - Managed load balancer

    • Classic LB (v1, 2009)

      • Deprecated

    • Application Load Balancer (v2, 2016)

      • Layer 7 (HTTP/S, WebSockets)

      • LB to multiple applications on same machine

      • LB to target group based on route in URL

      • LB to target group based on hostname in URL

      • LB to target group based on client IP

      • Supports dynamic host port mapping with ECS (redirect to same machine)

      • Before would have had to have one CLB per app

      • Stickiness at target group level (same instance)

        • Cookie generated by ALB

      • App server does not see IP of client directly, inserted in X-Forwarded-For

        • Also port via X-Forwarded_Port, and proto via X-Forwarded-Proto

      • Does do connection termination to do this

      • Great fit for ECS/Containers

    • Network Load Balancer (v2, 2017)

      • TCP (Layer 4)

      • High perf, millions of requests per sec

      • Support static / elastic IP (per AZ), public must be elastic (can help whitelist by clients), private facing will get random private IP based on free ones at the time

      • Has cross zone balancing

      • Has SSL termination (Jan 2019)

      • Less latency ~100ms (vs 400ms for ALB)

      • Only for extreme perf, not default

      • NLB see client IP

    • Can have internal or external ELB

LB Stickiness, enabled in Target Groups

  • Stickiness works for CLB and ALB

  • Works with cookies, has an expiration date

  • Make sure user doesn't lose session data

  • Can bring imbalance over backend instances

    • Exam can ask if one instance is 80% and one 20% why that would be

  • Stickiness duration can be 1 sec to 7 days

LB SSL Certificates

  • LB uses x.509 certificate (SSL/TLS server cert) loaded on LB

  • Can manage certificates using ACM (AWS Certificate Manager)

  • Can create or upload your own certs alternatively

  • HTTPS listener

    • Must specify default certificate

    • Can add an optional list of certs to support multiple domains

    • SNI (Server Name Indication) is a feature allowing you to expose multiple SSL certs if the client supports it.

Auto-Scaling Groups (ASG)

  • A launch configuration

    • AMI + Instance Type

    • EC2 User Data

    • EBS Volumes

    • Security Groups

    • SSH Key Pair

  • Min/Max/Initial Capacity size

  • Network + Subnet information

  • Load Balancer Information

  • Scaling Policies (triggers)

  • Possible to scale in/out based on CloudWatch alarm

    • Alarm monitors a metric

    • Metrics are computed for the overall ASG instances

      • ex: Target average CPU

      • ex: Average network in or out

  • Can scale on custom metric (ex: connected users)

    • Send custom metric from app on EC2 to CloudWatch (PutMetric API)

    • Create alarm to react based on low / high values

    • Use the alarm as scaling policy for ASG

  • IAM roles attached to an ASG will get assigned to EC2 instances

  • ASG are free, pay only for instances

  • ASG can terminate instances marked unhealthy by a LB and replace them

  • Available Metrics:

    • ASGAverageCPUUtilization—Average CPU utilization of the Auto Scaling group.

    • ASGAverageNetworkIn—Average number of bytes received on all network interfaces by the Auto Scaling group.

    • ASGAverageNetworkOut—Average number of bytes sent out on all network interfaces by the Auto Scaling group.

    • ALBRequestCountPerTarget—Number of requests completed per target in an Application Load Balancer target group.

  • Default Termination Policy for ASG. It tries to balance across AZ first, and then delete based on the age of the launch configuration.

  • Scaling Cooldown, makes sure doesn't get out of control, no other scaling takes effect until cooldown is over. Can override default cooldown.

  • Can have default cooldown, but also policy specific to simple scaling policy. Good for scale-in that terminates instances, doesn't take much time.

  • Reduce costs by lowering cooldown from ex: 300 to 180.

  • If your app is scaling multiple times per hour, modify ASG cool-down timer and the CloudWatch Alarm Period that triggers the scale-in

Security Groups

  • Inbound traffic is blocked by default, outbound is authorised

  • Can be attached to multiple instances, and instances can have multiple security groups

  • Locked to a region/VPC combination

  • Best practice use one just for SSH

  • If your application timeouts it's the SG

  • Can reference security group for access


Databases

RDS

  • Postgres

  • Oracle

  • MySQL

  • MariaDB

  • MS SQL

  • Aurora (proprietary)

  • DB Identifier (name) must be unique across region

  • Your responsibility

    • Check IP / Port / SG inbound rules

    • In-database user creation and permissions

    • Creating database with or without public access

    • Ensure parameter groups or DB is configured to only allow SSL

  • AWS Responsibility

    • No SSH access

    • No manual DB patching

    • No Manual OS patching

    • No way to audit underlying instance

For SAs

  • Read replicas can only do SELECT

  • RDS supports Transparent Data Encryption for Oracle or SQL Server

    • Is on top of KMS, may affect performance

  • IAM Authentication vs un/pw for MySQL and PostgreSQL

    • Lifespan of an IAM authentication token is 15 mins (short-lived), better security

    • Tokens are generated by IAM credentials

    • SSL must be used (or connection refused)

    • Easy to use EC2 Instance Roles to connect to RDS DB (so don't need DB credentials in actual instance for non IAM)

  • Managed Service =

    • OS patching

    • Point in Time Restore backups

    • Monitoring dashboards

    • Read replicas for read perf

    • Multi AZ set for DR

    • Maintenance windows for upgrades

    • Scaling (vert and horiz)

    • BUT no SSH

RDS Read Replicas for scalability

  • Up to 5 Read Replicas

  • Within AZ, Cross AZ, or Cross Region

  • Replication is ASYNC (eventually consistent)

  • Replicas can be promoted to their own DB

  • Applications must updated connection string to leverage read replicas

    • One string for master, 1 for each replica

Can combo Read Replicas and DR Multi AZ

RDS Multi AZ (Disaster Recovery)

  • SYNC replication

  • One DNS name for auto failover to standby

  • Increases availability (duh)

  • For AZ loss

  • No manual intervention

  • Not for scaling

RDS Backups

  • Automatically enabled

  • Automated Backups

    • Daily full snapshot of DB

    • Captures transaction logs in real

      • Ability to restore to any point in time

    • 7 days retention (can increase to 35) (can lower as well)

  • DB Snapshots (can be manually triggered)

    • Retention for as long as you want (keep specific state, or long term)

RDS Encryption

  • Encryption at rest with AES KMS - AES256 encryption

    • Only at creation

    • or: snapshot, copy as encrypted, create DB from snapshot (same as EBS)

  • SSL certificates to encrypt data in flight

  • To enforce SSL:

    • PostgreSQL: rds.force_ssl=1 in the AWS RDS console (parameter groups)

    • MySQL: Within the DB: GRANT USAGE ON . TO 'mysqluser'@'%' REQUIRE SSL;

  • To connect using SSL:

    • Provide SSL Trust certificate (can be downloaded from AWS)

    • Provide SSL options when connecting to DB

RDS Security

  • RDS DB are usually deployed in private subnet

  • Security works by leveraging security groups for who can communicate with it

  • IAM policies help control who can manage RDS

  • Traditional username and password to log into DB itself

  • IAM users now works with Aurora/MySQL

RDS vs. Aurora

  • Proprietary

  • Postgres and MySQL drivers supported

  • Cloud optimized - 5x perf for MySQL, 3x perf for Postgres

  • Automatically grows in increments of 10GB up to 64TB

  • Aurora can have 15 replicas, MySQL only 5, and replication is faster (sub 10ms lag)

  • Failover in Aurora is instantaneous, HA native.

  • Aurora costs 20% more than RDS, but is more efficient.

Aurora

  • Automatic failover

  • Backup and recovery

  • Isolation and security

  • Industry compliance

  • Push-button scaling

  • Automated patching with zero downtime

  • Advanced monitoring

  • Routine maintenance

  • Backtrack: restore data at any point in time without backups

  • HA and Read Scaling

    • 6 Copies of data across 3 AZ

      • 4 copies out of 6 needed for writes

      • 3 copies out of 6 needed for reads

      • Self healing with peer-to-peer replication (for corrupted data)

      • Storage is striped across 100's of volumes

    • One Aurora instance takes writes, Master

    • Automated failover for master in less than 30 secs

    • Master + up to 15 Read Replicas serve reads (any replica can become master)

    • Support for Cross Region Replication

  • Shared logical storage volume across AZs for Replication + Self-Healing + Auto Expanding

  • Master is only writer

    • Writer Endpoint (DNS name) always points to current master, for failover

    • Read Replicas can do auto-scaling

      • Reader Endpoint Connection load balancing for reads, across all scaled instances. Happens at connection level not statement level.

      • ![Screen Shot 2019-11-18 at 14.10.27.png](../../../../_resources/Screen Shot 2019-11-18 at 14.10.27.png)

Aurora Security

  • Encryption at rest using KMS

  • Automated backups, snapshots and replicas are also encrypted

  • Encryption in flight using SSL (same process as MySQL or Postgres)

  • Authentication using IAM

  • You are responsible for protecting via SG

  • No SSH

Aurora Serverless

  • No need to choose an instance size

  • Only supports MySQL 5.6 & Postgres in beta

  • Helpful when you can't predict workload

  • DB cluster starts, shuts down, and scales automatically based on CPU / connections

  • Can migrate from Aurora Cluster to Serverless and vice versa

  • Serverless usage is measured in ACU (Aurora Capacity Units)

  • Billed in 5 minute increments of ACU

  • Some features aren't supported in serverless, so check docs

Aurora for SAs

  • Can use IAM for Aurora

  • Aurora Global Databases span multiple regions and enable DR

    • One primary region

    • One DR Region

    • The DR region can be used for lower latency reads

    • < 1 sec replication lag on average

  • If not using Global Databases you can create cross region Read Replicas

    • FAQ recommends Global Databases instead

Elasticache

  • Managed in-memory DB, high perf, low latency.

  • Redis or Memcached

  • Reduce load on DB

  • Make app stateless (keep state in cache)

  • Write scaling using Sharding

  • Read scaling using Read Replicas

  • Multi AZ with Failover

  • AWS takes care of all normal stuff

  • App queries ElastiCache, either gets cache hit or cache miss, in case of miss it gets cached for hit next time

  • Cache must come with invalidation strategy for only most current data (app based)

  • User session store (keep it stateless)

    • Application writes session data into ElastiCache

    • User hits a different application instance

    • Instance retrieves the data from cache to keep session going

  • Redis

    • In-memory key-value store

    • Super low latency (sub ms)

    • Cache survives reboot by default (persistence)

    • Multi AZ with automatic failover for DR (if you want to keep cache data)

    • Support for Read Replicas and Cluster

    • Good for: User sessions, Leaderboard (has a sort), Distributed states, Relive pressure on DB, Pub / Sub capability for messaging

  • Memcached

    • In-memory object store

    • Cache does not survive reboots

    • Good for: Quick object retrieval, cache often accessed objects

ElastiCache for SAs

  • Security

    • Redis supports RedisAUTH (un/pw)

    • SSL in-flight must be enabled and used

    • Memcached supports SASL

    • None support IAM

    • IAM policies are used only for AWS API level security

  • Patterns for ElastiCache

    • Lazy Loading: all read data is cached, can become stale

    • Write Through: Adds or updates data in the cache when written to DB (no stale data)

    • Session Store: stores temp session data (using TTL features maybe)

DynamoDB

  • Fully managed, Highly Available with replication across 3 AZs

  • Scales to massive workloads, distributed database

  • Millions of request per second, trillions of rows, 100s of TB of storage

  • Fast and consistent in performance (low retrieval latency)

  • Integrated with IAM for security, authorization, and administration

  • Enables event driven programming with DynamoDB Streams

  • Low cost and auto scaling

Basics

  • DynamoDB is made of tables

  • Each table has a primary key (must be decided at creation)

  • Each table can have an infinite number of items (=rows)

  • Each item has attributes* (can be add over time, can be null, =columns)

  • Maximum item size = 400KB

  • Data types supported are:

    • Scalar types: String, Number, Binary, Boolean, Null

    • Document types: List, Map

    • Set Types: String Set, Number Set, Binary Set

  • Table must be provisioned read and write capacity units

  • Read Capacity Units (RCU): throughput for reads ($0.00013 per RCU)

    • 1 RCU = 1 strongly consistent read of 4KB per second

    • 1 RCU = 2 eventually consistent read of 4KB per second

  • Write Capacity Unites (WCU): throughput for writes ($0.00065 per WCU)

    • 1 WCU = 1 write of 1KB per second

  • Option to set up auto-scaling of throughput to meet demand

  • Throughput can be exceeded temporarily using "burst credit"

  • If burst credit are empty you'll get "ProvisionedThroughPutException"

  • Then do exponential back-off retry

DynamoDB - DAX

  • DynamoDB Accelerator

  • Seamless cache for DDB, no app re-write

  • WRites go through DAX to DynamoDB

  • Microsecond latency for cached reads and queries

  • Solves the Hot Key Problem (too many reads)

  • 5 minute default TTL for cache

  • Up to 10 nodes in the cluster

  • Multi AZ (3 nodes minimum for production recommended)

  • Secure (Encryption at rest with KMS, VPC integration, IAM, CloudTrail, etc)

DynamoDB Streams

  • Changes in DynamoDB (Create, Update, Delete) can end up in a DynamoDB Stream

  • This stream can then be read by Lambda, then we can:

    • React to changes in real time (welcome email to new users)

    • Analytics

    • Insert into ElasticSearch

    • etc

  • Could implement cross region replication using Streams

  • Stream has 24 hours of data retention

![Screen Shot 2019-12-03 at 16.40.54.png](../../../../_resources/Screen Shot 2019-12-03 at 16.40.54.png)

New Features

  • Transactions

    • All or nothing type operations

    • Coordinated Insert, Update, Delete across multiple tables (all work or nothing)

    • Include up to 10 unique items, or up to 4MB data

  • On Demand

    • No capacity planning needed (WCU/RCU) - scales automatically

    • 2.5x more expensive than provisioned

    • Helpful when spikes are un-predictable or the app is very low throughput

Security and Other

  • Security

    • VPC Endpoints, access without internet

    • Fully controlled by IAM

    • Encryption at rest with KMS, in transit with SSL/TLS

  • Backup and Restore available

    • Point in time like RDS

    • No performance impact

  • Global Tables (require Streams enabled)

    • Multi region, fully replicated, high performance

  • DMS can be used to migrate to DDB from Mongo, Oracle, S3, etc

  • Can launch local version of DDB for dev purposes


Athena

  • Serverless service to perform analytics directly against S3 files

  • Uses SQL to query

  • Has a JDBC / ODBC driver

  • Charged per query and amount of data scanned

  • Supports CSV, JSON, ORC, Avro, and Parquet

  • For: BI, analytics, reporting, analyze VPC vlow logs, ELB logs, CloudTrail trails, etc.


Route 53

  • Most common records

    • A: URL to IPv4

    • AAAA: URL to IPv6

    • CNAME: URL to URL (non root domain)

    • Alias: URL to AWS resource (root and non-root), free of charge, supports native health checks

  • Can use

    • Public domain names

    • Private domain names that can only be resolved by your VPC instances

  • $0.50 per hosted zone

  • Has

    • Load Balancing (through DNS, client LB)

    • Health checks (limited)

    • Routing policy: simple, failover, geolocation, latency, weighted, multi value

  • Simple Routing Policy

    • Maps a domain to one URL

    • Use when directing to a single resource

    • Cannot attach health checks

    • If multiple values are returned, a random one is chosen by client

  • Weighted Routing Policy

    • Control % of requests that go to specific endpoint (ex: 70, 20, 10. Sum does not have to be 100)

    • Create multiple record sets with weighted option

    • Helpful to test 1% of traffic on new app

    • Split traffic between regions

    • Can be associated with health checks

  • Latency Routing Policy

    • Redirect to server that has the least latency, close to request

    • Evaluated in terms of user to designated AWS region

    • Must specify region in latency record

    • Germany could be directed to US if lower latency

Route 53 Geolocation Policy

  • Different from latency based

  • Based on user location

  • Traffic from England should go to X

  • Must have a default policy if no other match exists

Multi Value Routing Policy

  • Use when routing traffic to multiple instances

  • When want to associate a Route 53 health check with records, removes unhealthy from returned values

  • Up to 8 healthy records are returned for each MultiValue query (even if you have 50)

  • MultiValue is not a substitute for using ELB

Route 53 Health Checks

  • Will not send traffic to if failed

  • Deemed unhealthy if checks fail 3 times

  • Deemed healthy if checks pass 3 times

  • Default interval 30 secs (can set fast health check at 10s, higher cost)

  • About 15 health checkers will launch to check endpoint health

    • one request every 2 secs on average

  • Can have HTTP, TCP, and HTTPS check (no SSL certificate verification)

  • Possibility of integrating health checks with CloudWatch

  • Health checks can be linked to Route 53 DNS record set

Route 53 as a Registrar

  • Offer both Registrar and DNS service


Developing on AWS

CLI

  • Never put personal credentials on EC2 machine, whole account compromised

  • Use Roles

Roles

  • Attached to EC2 instance

  • Come with policy authoring what instance is authorized for

  • Best practice

  • Instance can only have one role at a time

Policies

AWS SDK

  • AWS CLI is a wrapper around Python SDK (boto3)

  • If you don't specify a region defaults to us-east-1

  • Recommended to use default credential provider chain

    • Works with:

      • AWS credentials in .aws (local or on-prem)

      • Instance Profile Credentials using IAM Roles for EC2 machines etc.

      • Environment variables (AWS_ACCESS_KEY_ID, etc.), not often used

  • NEVER STORE CREDENTIALS IN YOUR CODE, abstract

  • Always use IAM Roles when working within AWS Services

  • Exponential Backoff

    • Any API that fails because of too many calls needs to be retrieved with Exponential Backoff

    • These apply to rate limited APIs

    • The retry mechanism includ3d in SDK API calls

    • 1 ms, 2 ms, 4ms, 8ms


CloudFormation

  • We didn’t specify a name in the json file for this bucket, so AWS names it with the [STACKNAME]-[LOGICAL_VOLUME_NAME]-[RANDOM_STRING] format.

  • What is logical volume name, based on resource in CFN?

  • Stacks have logical resources in them that create physical resources


CloudFront

  • Cached at edge locations

  • Popular with S3 but works with EC2 and LB as well

  • Helps with network attacks

  • Provides SSL (HTTPS) via ACM

  • Can use SSL (HTTPS) to talk internally to applications

  • Supports RTMP

  • Origin Access Identity

    • Limit S3 to be only accessed via this identity

CloudFront Signed URL / Signed Cookies

  • To distrbute paid shared content which lives in S3

  • If S3 can only be accessed via CloudFront we can't use self-signed S3 URLs

  • Can attach a policy with:

    • URL expiration

    • IP ranges for access

    • Trusted signers (which AWS Account can create signed URLs)

  • CloudFront signed URLs can only be created using the AWS SDK

  • Validity length?

    • Share content, movies etc, short = few minutes

    • Private content (to user) longer = years

467b9f8275d13f52db7a380742e35b92.png

CloudFront vs S3 Cross Region Replication

  • CloudFront

  • Global Edge network

  • Files are cached for a TTL (maybe a day)

  • Great for static content that must be available everywhere

  • S3 Cross Region Replication

    • Must be set up for each region

    • Files are updated near real-time

    • Read only

    • Great for dynamic content that needs low-latency in a few regions

CloudFront Geo Restriction

  • Restrict who can access your distribution

    • Whitelist by country

    • Blacklist by country

  • Country is determined by usnig 3rd party Geo-IP database

  • Copywrite law etc.


Messaging

General

  • Two patterns of application communication

    • Synchronous (app to app)

      • Problematic if there are suddent spikes of traffic

    • Asynchronous / Event Based (Queue)

      • Better to decouple (SQS: Queue, SNS: Pub/Sub, Kinesis: real-time streaming)

SQS (Super important)

SQS Standard Queue

  • Publisher -> Queue -> Consumer

  • Fully managed

  • Scales from 1 message per second to 10000s per second

  • Default retention: 4 days, maximum 14 days

  • No limit to how many messages in queue

  • Low latency (<10ms on publish and receive)

  • Horizontal scaling in terms of number of consumers

  • Can have duplicate messages (at least once delivery. Occasionally)

  • Can have out of order messages (best effort ordering)

  • Limitation of 256KB per message

SQS Delay Queue

  • Delay a message up to 15 minutes (consumers don't see it immediately)

  • Default os 0 second (available right away)

  • Can set a default at queue level

  • Can override the default using the DelaySeconds parameter, queue holds it

Producing Messages

  • Define Body (String up to 256KB)

  • Metadata, message attributes (optional) of Key Value pair, with Type

  • Provide Delay Delivery

  • Get Back

    • Message identifier

    • MD5 hash of the body

Consuming Messages

  • Poll SQS for messages (receive up to 10 at a time)

  • Process the message within the Visbility Timeout

  • Delete the message fro mthe queue using the message ID and receipt handle

Visibility Timeout

  • When a consumer polls a message from a queue the message is then "invisible" to other consumer for the defined Visibility Timeout perdiod

    • Set between 0 seconds and 12 hours (default 30 secs)

    • If too high (15 mins) and consumer fails to process, you have to wait a long time before retry

    • If too low (30 secs) and consumer needs more time to process another consumer will receive the message and it will be processed more than once

  • ChangeMessageVisibility API to change the visbility while processing a message, consumer can alert SQS it needs more time

  • DeleteMessage API to tell SQS the message was successfully processed

Dead Letter Queue

  • If a consumer fails to process a message within the Visibility Timeout it goes back to the queue

  • We can set a threshold of how many times a message can go back, it's called a redrive policy

  • After that threshold is exceeded the message goes into the Dead Letter Queue (DLQ)

  • We have to create a DLQ first, then designate it as a DLQ

  • We must make sure to process messages in the DLQ before they expire

Long Polling (Receive Message Wait Time)

  • When a consumer requests messages from the queue it can optionally "wait" for messages to arrive if there are none

  • LongPolling decreases the number of API calls made to SQS while increasing efficiency and latency of the app.

  • The wait time can be between 1 - 20 seconds, 20 preferable

  • Long Polling is preferred to Short Polling

  • Long Polling can be enabled at the queue level, or at the API level when making the poll via WaitTimeSeconds

FIFO Queue

  • Name of the queue must end in .fifo

  • Lower throughput (up to 3000 per sec with batching, 300/s without)

  • Messages are processed in order by the consumer

  • Messages are sent exactly once

  • No per message delay (only per queue delay)

  • Ability to content based de-duplication

  • 5 minute interval de-duplication using "Duplication ID"

  • Message Groups:

    • Possibility to group messages for FIFO ordering using "Message GroupID"

    • Only one worker can be assigned per message group, so message are processed in order

    • Message group is just an extra tag on the message

SNS

  • Event producer only sends one message to the SNS topic

  • As many event receivers (subscriptions) as you want can listen to the SNS topic notifications

  • Each subscriber will get all the messages (new feature to filter messages)

  • Up to 10,000,000 subscriptions per topic

  • 100,000 topic limit

  • Subscribers can be:

    • SQS

    • HTTP/S (with delivery retries)

    • Lambda

    • Emails

    • SMS messages

    • Mobile notifications

SNS Integrations

  • Some services can send data directly to SNS for notifications

  • CloudWatch for alarms

  • Auto Scaling Groups notifications

  • Amazon S3 on bucket events

  • CloudFormation upon state changes

  • etc

How to publish

  • Must process message right away, not stored in SNS Topic

  • Topic Publish (Within your AWS server using the SDK or CLI)

    • Create a topic

    • Create a subscription (or many)

    • Publish to the topic

  • Direct Publish (for mobile apps SDK) (Not on exam)

    • Create a platform application

    • Create a paltform endpoint

    • Publish to the platform endpoint

    • Work with Google GCM, Apply APNS, Amazon ADM

SNS + SQS - Fan Out

  • Push once in SNS, receive in many SQS

  • Fully decoupled

  • No data loss

  • Ability to add receivers of data later, flexible

  • SQS allows for delayed processing and retries of work (implying SNS does not)

  • Can have many workers on one queue and one worker on the other, or whatever

![Screen Shot 2019-12-02 at 18.37.22.png](../../../../_resources/Screen Shot 2019-12-02 at 18.37.22.png)

  • SNS Protocols

    • HTTP/S

    • Email

    • Email-JSON

    • Amazon SQS

    • AWS Lambda

Kinesis

  • Managed alternative to Kafka

  • Data is automatically replicated to 3 AZ

  • Great for application logs, metrics, IoT, clickstreams

  • Great for "real-time" big data

  • Great for real-time streaming processing frameworks (Spark, NiFi, etc)

  • Kinesis Streams (just plain Kinesis): low latency streaming ingest at scale

  • Kinesis Analytics: perform real-time analytics (filters, computations, alerting, etc) on streams using SQL

  • Kinesis Firehose: load streams into S3, Redshift, ElasticSearch, etc

![Screen Shot 2019-12-02 at 18.55.26.png](../../../../_resources/Screen Shot 2019-12-02 at 18.55.26.png)

Kinesis Streams (important)

  • Streams are divided in ordered Shards / Partitions

  • Data retention is 1 day by default, up to 7 days (24-168 hours)

  • Ability to reprocess / replay data (unlike SQS)

  • Multiple applications can consume the same stream (like SNS)

  • Real-time processing with a scale of through put (add more shards)

  • Once data in inserted into Kinesis it can't be deleted (immutability)

  • Think of a shard as a little queue

  • Kinesis is a highway, want to get the data to destination ASAP

![Screen Shot 2019-12-02 at 19.08.07.png](../../../../_resources/Screen Shot 2019-12-02 at 19.08.07.png)

Shards

  • One stream is made up of many different shards

  • Write: 1MB/s or 1000 messagess at write side PER SHARD

  • Read: 2MB/s at read side PER SHARD

  • Billing is per shard provisioned, can have as many as you want

  • Batching available for message push or for message calls

  • The number of shards can evolve over time (reshard / merge, essentially autoscaling)

  • Records are ordered per shard (SQS is unordered, fifo one queue, kineses in-between)

Kinesis API - Put records

  • On producer side

  • PutRecord API + partition key (any string) that gets hashed to determine shard id

  • The key is a way to route data to a specific shard

  • The same key goes to the same partition (data only goes to one shard at a time)

  • Messages sent get a sequence number

  • Choose a partition key that is highly distributed (helpes prevent "hot parition", overused shard)

    • Good user_id if many users

    • Bad country_id if most users are from same country

  • Use batching and PutRecords to reduce costs and increase throughput

  • ProvisionedThroughputExceeded if we go over the limits, then use Retries or ExponentialBackoff

  • Can use CLI, SDK, or producer libraries from various frameworks

Kinesis API - Exceptions

  • ProvisionedThroughputExceeded exceptions

    • Happens when sending too much data

    • Make sure you don't have a hot shard

  • Solution

    • Retries with backoff

    • Increase shards (scaling)

    • Ensure your partition key is a good one

Kinesis API - Consumers

  • Can use a normal consumer (CLI, SDK, etc)

  • Can use Kinesis Client Library (in java, Node, Python, Ruby, .Net)

    • KCL uses DynamoDB to checkpoint offsets

    • KCL uses DynamoDB to track other workers and share the work amongst shards (to improve efficiency)

Kinesis Security

  • Control access and authorization via IAM policies

  • In-Flight using HTTPS endpoints

  • At rest with KMS

  • Can encrypt/decrypt client side (difficult)

  • VPC endpoints available for Kinesis to access within VPC (no internet access)

Kinesis Data Analytics

  • Perform real-time analytics on Streams using SQL

  • Kinesis Data Analytics

    • Autoscaling

    • Managed

    • Continuous (real-time, no delay)

  • Pay for actual consumption rate

  • Can create new streams out of the real-time queries

Kinesis Firehose

  • Fully managed, no administration

  • Near real-time (perhaps 60 secs)

  • Load data into Redshift, S3, ElasticSearch, Splunk (ETL)

  • Autoscaling

  • Support for many data formats (but pay for conversion)

  • Pay for data going through, consumption model

SQS vs SNS vs Kinesis

  • Only one consumer per shard for Kinesis

![Screen Shot 2019-12-02 at 20.34.47.png](../../../../_resources/Screen Shot 2019-12-02 at 20.34.47.png)

Amazon MQ

  • SQS and SNS are cloud-native, using proprietary protocols from AWS

  • Traditional on-premises apps may use open protocols like: MQTT, AMQP, STOMP, Openwire, WSS

  • When migrating to cloud instead of re-engineering we can use Amazon MQ

  • Amazon MQ = managed Apache ActiveMQ

  • Amazon MQ doesn't scale as much

  • Runs on a dedicated machine, can run in HA multi-AZ

  • Has both a Queue feature (SQS) and topic feature (SNS)


Serverless

  • Just deploy functions (FaaS)

  • Lambda & Step Functions

  • DynamoDB

  • Cognito

  • API Gateway

  • S3

  • SNS & SQS

  • Kinesis

  • Aurora Serverless

Lambda

  • Virtual functions

  • Limited by time - short executions, when done, done

  • Run on-demand (run in ms)

  • Scaling is automated

  • Easy pricing

    • Pay per request and compute time

    • Free tier has 1,000,000 requests and 400,000 GBs of compute time

  • Integrated with whole AWS Stack

  • Integrated with many programming languages

  • Easy monitoring through AWS CloudWatch

  • Easy to get more resources for your functions (up to 3GB of ram)

  • Increasing RAM also improves CPU and network

  • Node.js (javascript), Python, Java (v8 compatible), C# (.NET Core), Golang, C# / Powershell

  • Main integrations

    • API GW

    • Kinesis

    • DynamoDB

    • S3

    • IoT

    • CloudWatch Events and Logs

    • SNS

    • Cognito

    • SQS

Pricing

  • Pay per *calls

    • First 1,000,000 are free

    • $0.20 per 1 million thereafter

  • Pay per duration (100ms increments)

    • 400,000 GB-seconds of compute time free per month

    • == 400,000 seconds if function is 1GB RAM

    • == 3,200,000 seconds is function is 128MB RAM

    • After that $1.00 for 600,000 Gb-s

Lambda Configuration

  • Timeout: default of 3 secs, max of 900s (15min)

  • Environment variables

  • Allocated memory (128M to 3G)

  • Ability to deploy within a VPC and assign security groups

  • IAM execution role must be attached to the Lambda function

Limits (exam)

  • Execution

    • Memory allocation: 128MB - 3008 MB (in 64MB increments)

    • Maximum execution time: 300s (5 minutes), now 15 but exam assumes 5

    • Disk capacity in the "function container" (in /tmp): 512MB

    • Concurrency limits: 1000 (can service ticket)

  • Deployment:

    • Function deployment size (compressed .zip): 50MB

    • Uncompressed deployment (code+dependencies): 250MB

    • Can use /tmp dir to load other files at startup (for more than 250MB)

    • Size of environment variables: 4KB (therefore can't pass file)

Lambda @ Edge

  • Have a CloudFront CDN

  • @Edge allows you tu run global Lambda alongside

  • Or do request filtering before reaching application

  • Global as opposed to a region

  • More responsive apps

  • Customize CDN content

  • Pay per use

  • Use Lambda to change CloudFront requests and responses

    • After CloudFront receives a request from a viewer (viewer request)

    • Before CloudFront forwards the request to the origin (origin request)

    • After CloudFront receives the response from the origina (origin response)

    • Before CloudFront forwards the response to the viewer (viewer response)

![Screen Shot 2019-12-03 at 14.16.13.png](../../../../_resources/Screen Shot 2019-12-03 at 14.16.13.png)

  • You can also generate responses to viewers without ever sending the request to the origin

![Screen Shot 2019-12-03 at 14.19.09.png](../../../../_resources/Screen Shot 2019-12-03 at 14.19.09.png)

  • Use Cases

    • Website Security and Privacy

    • Dynamic Web Application at the Edge (see above pic)

    • SEO

    • Intelligently route across Origins and Data Centers

    • Bot mitigation at Edge

    • Real-time image transformation

    • A/B Testing

    • User authentication and authorization

    • User Prioritization

    • User Tracking and Analytics


API GW

  • AWS Lambda + API Gateway: No infra to manage

  • Handle API versioning (v1, v2, etc)

  • Handle different environments (dev, test, prod)

  • Handle security (Authentication and Authorization)

  • Create API keys, handle request throttling

  • Swagger / Open API import to quickly define APIs

  • Transform and validate requests and responses

  • Generate SDK and API specifications

  • Cache API responses

  • Stage variables allow you to modularize your stages, different for dev or prod for example

Integrations

  • Outside of VPC

    • Endpoints on EC2

    • Load Balancers

    • Any AWS service

    • External and publicly accessible HTTP endpoints

  • Inside of VPC

    • AWS Lambda in your VPC

    • EC2 endpoints in your VPC

Security (exam)

  • IAM Permissions

    • Create an IAM policy authorization and attach to application User/Role

    • API GW verifies IAM permissions passed by calling the application

    • Good to provide access within your own infra, but not for outside

    • Leverages Sig v4 capability where IAM credentials are in headers

  • Lambda/Custom Authorizer

    • Uses Lambda to validate the token passed in the header

    • Option to cache the results of authentication

    • Helps to use OAuth / SAML / 3rd party type of authentication

    • Lambda must return an IAM policy for the user

![Screen Shot 2019-12-04 at 12.32.53.png](../../../../_resources/Screen Shot 2019-12-04 at 12.32.53.png)

  • Cognito User Pools

    • Cognito fully manages user lifecycle

    • API GW verifies identity autmatically from AWS Cognito

    • No custom implementation required

    • Cognito only helps with authentication, not authorization

![Screen Shot 2019-12-04 at 12.37.55.png](../../../../_resources/Screen Shot 2019-12-04 at 12.37.55.png)

  • Summary

  • IAM

    • Great for users / roles already within your AWS account

    • Handle authentication + authorization

    • LEverages Sig v4

  • Custom Authorizer (Lambda)

    • Great for 3rd part tokens

    • Very flexible in terms of what IAM policy is returned

    • Handle authentication + authorization

    • Pay per Lambda invocation (but can cache to save calls)

  • Cognito User Pool

    • You manage your own user pool (non-IAM) (can be backed by Facebook, Google login, etc)

    • No need to write custom code

    • Must implement authorization on the backend

Cognito

  • Gives users an identity so that they can interact with our application

  • Cognito User Pools

    • Sign in functionality for app users

    • Integrate with API GW

  • Cognito Identity Pools (Federated Identity)

    • Provide AWS credentials to users so tehy can access AWS resources directly

    • Integrate with Cognito User Pools as an identity provider

  • Cognito Sync (being replaced by AppSync)

    • Synchronize data from device to Cognito

  • Cognito User Pools (CUP) (app authentication)

    • Create a serverless database of users for your mobile apps

    • Simple login: Username (or email) / password combination

    • Possibility to verify emails / phone number and add MFA

    • Can enable Federated Identities (Facebook, Google, SAML, etc)

    • Sends back a JSON Web Token (JWT)

    • Can be integrated with API GW for authentication

![Screen Shot 2019-12-04 at 12.49.24.png](../../../../_resources/Screen Shot 2019-12-04 at 12.49.24.png)

  • Cognito Federated Identity Pools (AWS IAM access)

    • Goal:

      • Provide direct access to AWS resources from the client side

    • How:

      • Log in to federated identity provider - or remain anonymous

      • Get temporary AWS credentials back from the Federated Identity Pool

      • These credentials come with a pre-defined IAM policy stating their permissions

    • Example:

      • Provide temporary access to write to a S3 buck using Facebook Login

![Screen Shot 2019-12-04 at 12.53.33.png](../../../../_resources/Screen Shot 2019-12-04 at 12.53.33.png)

  • Cognito Sync (deprecated, now AppSync)

    • Store preferences, configuration, state of app

    • Cross device (any platform - iOS, Android, etc)

    • Offline capability (synchronization when back online)

    • Requires Federated Identity Pool in Cognito (not User Pool)

    • Store data in datasets (up to 1MB)

    • Up to 20 datasets to synchronize


Serverless Solution Architecture

Rewatch Section

  • S3 Transfer acceleration, upload hits CloudFront which puts to S3

![Screen Shot 2019-12-05 at 14.55.35.png](../../../../_resources/Screen Shot 2019-12-05 at 14.55.35.png)

![Screen Shot 2019-12-05 at 14.57.23.png](../../../../_resources/Screen Shot 2019-12-05 at 14.57.23.png)

  • Microservices

  • You are free to design each micro-service the way you want

  • Synchronous patterns: API GW, LB

  • Asynchronous patterns: SQS, Kinesis, SNS, Lambda triggers (S3)

  • Challenges with microservices

    • Repeated overhead for creating each new microservice

    • Issues with optimizing server density/utilization

    • Complexity of running multiple versions of multiple microservices simultaneously

    • Proliferation of client-side code requirements to integrate with many seprate services

  • Some of the challenges are solved by Serverless patterns

    • API GW and Lambda scale automatically and you pay per usage

    • You can easily clone APIs to reproduce environments

    • Generated client SDK through Swagger integration for the API gateway

![Screen Shot 2019-12-05 at 15.20.53.png](../../../../_resources/Screen Shot 2019-12-05 at 15.20.53.png)

![Screen Shot 2019-12-05 at 15.27.44.png](../../../../_resources/Screen Shot 2019-12-05 at 15.27.44.png)


Database Comparison

  • Questions to choose the right database based on your architecture

    • Read heavy, write heavy, balanced workload? Throughput needs? Will it change, does it need to scale or fluctuate during the day?

    • How much data to store and for how long? Will it grow? Average object size?

    • Data durability (week, years)? Source of truth for the data?

    • Latency requirements? Concurrent users?

    • Data model? How will you query the data? Joins? Structured? Semi-structured?

    • Strong schema? More flexibility? Reporting? Search? RDBMS / NoSQL?

    • License costs? Switch to Cloud Native DB such as Aurora?

  • Database Types

    • RDBMS (= SQL/OLTP): RDS, Aurora - great for joins

    • NoSQL: DyamoDB (~JSON), ElasticCache (key/value pairs), Neptune (graphs) - no joins, no SQL

    • Object Store: S3 (for big objects), Glacier (backups /archives)

    • Data Warehouse (=SQL Analytics / BI): Redshift (OLAP), Athena

    • Search: ElasticSearch (JSON) - free text, unstructured searches

    • Graphs: Neptune - displays relationship between data

  • RDS Overview

    • Managed PostgreSQL / MySQL / Oracle / SQL server

    • Must provision an EC2 instance and EBS volume type and size

    • Suport for Read Replicas and Multi AZ

    • Security through IAM, Security Groups, KMS, SSL in transit

    • Backup / Snapshot / Point in time restore

    • Managed and Scheduled maintenance

    • Monitoring through CloudWatch

    • Use Case: Store relational datasets (RDBMS / OLTP), perform SQL queries, transactional inserts / update / delete available

  • RDS for Solutions Architect (WAF)

    • Operations: small downtime for failover and maintenance, scaling with read replicas and EC2 type, restore EBS implies manual intervention, application changes must be done for changes

    • Security: AWS responsible for OS security, we are responsible for setting up KMS, security groups, IAM policies, authorzing users in DB, using SSL

    • Reliability Multi AZ feature, failover in case of failures

    • Performance: depends on EC2 instance type, EBS volume type, ability to add Read Replicas. Doesn't auto-scale

    • Cost: Pay per hour based on provisioned EC2 and EBS

  • Aurora Overview

    • Compatible API for PostgreSQL and MySQL

    • Data is held in 6 replicas, across 3 AZ

    • Auto-healing capability

    • Multi-AZ, Auto-Scaling Read Replicas

    • Read Replicas can be Global

    • Aurora database can be Global for DR or latency purposes

    • Auto-scaling of storage from 10GB to 64TB

    • Define EC2 instance type for Aurora, but changeable

    • Same security / monitoring / maintenance features as RDS

    • "Aurora Serverless" option

    • Use case: Same as RDS but with less maintenance / more flexibility / more performance

  • Operations: less operations, auto-scaling storage

  • Security: AWS responsible for OS security,we are responsible for setting up KMS, security groups, IAM policies, authorizing users in DB, using SSL

  • Reliability: Multi AZ, HA, possibly more than RDS (6 data copies), Aurora Serverless option

  • Performance: 5x performance due to architectural optimizations, up to 15 read replicas (5 for RDS)

  • Cost: Pay per hour bsed on EC2 and storage usage. Possibly lower costs compared to things like Oracle

  • ElastiCache Overview

    • Managed Redis / Memacached (same offering as RDS but caches)

    • In-memory data store, sub-millisecond latency

    • Must provision and EC2 instance type

    • Support for Clustering (redis) and Multi AZ, Read Replicas (Sharding)

    • Security through IAM, Security Groups, KMS, Redis Auth

    • Backup, Snapshot, Point in time restore

    • Managed and scheudled maintenance

    • Monitoring through CloudWatch

    • Use case: Key/Value store, frequent reads, less writes, cache results for DB queries, store session data for websites, cannot use SQL (retrive by key not query)

  • Operations: Same as RDS

  • Security: AWS responsible for OS security, we for KMS, security groups, users (Redis Auth), using SSL

  • Reliability: Clustering, Multi AZ

  • Performance: Sub-millisecond performance, in memory, read replicas for sharding

  • Cost: Pay per hour based on EC2 and storage usage

  • DynamoDB Overview

    • AWS proproetary technology, managed NoSQL

    • Serverless, provisioned capacity, auto-scaling, on demand capacity (Nov 2018)

    • Can replace ElastiCache as a key/value store (storing session data for ex)

    • HA, Multi AZ by default, Read and Writes are decoupled, DAX for read cache

    • Reads can be eventually consistent or strongly consistent

    • Security, Authentication, and Authorization is done through IAM

    • DynamoDB Streams to integrate with Lambda (on any DB change)

    • Backup / Restore feature, Point in Time (35 days), GlobalTable feature (requires DDB Streams enabled)

    • Monitoring through CloudWatch

    • **Can only query on primary key, sort key, or indexes **

    • Use case: Serverless application development (small docs 100s KB), distributed serverless cache, doesn't have SQL query language available, has transactions capability from Nov 2018

  • Operations: No operations needed, auto-scaling capability, serverless

  • Security:Full security through IAM policies, KMS encryption, SSL in flight

  • Reliability: Multi AZ, Backups, Point in Time

  • Performance:Single digit millisecond performance, DAX For sub caching reads, performance doesn't degrade if app scales

  • Cost: Pay per provisioned capacity and storage usage, no need to gues (can use auto-scaling)

  • S3 Overview

    • S3 is a key / value store for objects

    • Great for big objects, not so great for small objects

    • Serverless, scales infinitely, max object size is 5TB

    • Eventually consistent for overwrites and deletes

    • Tires: S3 Standard, S3 IA, S3 One Zone IA, Glacier for backups

    • Features: Versioning, Encryption, Cross Region Replicated, etc...

    • Security: IAM, Bucket Policies, ACL

    • Encryption: SSE-S3, SS-KMS, SSE-C, client side encryption, SSL in transit

    • Use case: Static files, key value store for big files, website hosting

  • Operations: No operations

  • Security: IAM, Bucket Policies, ACL, Encryption, SSL

  • Reliability: %99.999999999 durability, %99.99 availability, Multi AZ, CRR

  • Performance: Scales to thousands of read / writes per second, transfer acceleration (CloudFront) / multi-part upload for big files

  • Cost: Pay per storage used, network cost, requests number

  • Athena

    • Fully serverless database with SQL capabilities

    • Used to query data in S3

    • Pay per query

    • Output results back to S3

    • Secured through IAM

  • Operations: No operations, serverless

  • Security: IAM + S3 security

  • Reliability: Managed service, uses Presto engine, HA

  • Performance: Queries scale based on data size

  • Cost: Pay per query / per TB of data scanned, serverless

  • Redshift

    • Redshift is based on PostgreSQL, but it's not used for OLTP

    • It's OLAP - online analytical processing (analytics and data warehousing)

    • 10x better perf than other data warehouses, scale to PBs

    • Columnar storate of data (instead of row based)

    • Massively Parallel Query Execution (MPP), HA

    • Pay as you go based ont he instances provisioned

    • Has a SQL interface for performing the queries

    • BI tools such as Quicksight or Tableau integrate with it

    • Data is loaded from S2, DynamoDB, DMS, other DBs...

    • From 1 to 128 nodes, up to 160GB of space per node

    • Leader node: for query planning, results aggregation

    • Compute node: for performing the queries, send the results to leader

    • Redshift Spectrum: perform queries directly against S3, no need to load

    • Backup & restore, Security VPC / IAM / KMS / Monitoring

    • Redshift Enahnced VPC Routing: COPY & UNLOAD goes through VPC not internet

  • Operations: Similar to RDS

  • Security: IAM, VPC, KMS, SSL (similar to RDS)

  • Reliability: HA (cluster), auto-healing feautres

  • Performance: 10x perf, compression

  • Cost: Pay per node provisioned, 1/10th cost of others

  • Neptune

    • Fully managed graph database

    • For:

      • High relationship data

      • Social networking

      • Knowledge graphs (Wikipedia)

    • Highly available across 3 AZ, with up to 15 read replicas

    • Point in time recovery, continuous backup to Amazon S3

    • Support for KMS and HTTPS

  • Operations: Similar to RDS (must provision instance)

  • Security: IAM, VPC, KMS, SSL, IAM Authentication

  • Reliability: Multi AZ, clustering

  • Performance: Best suited for graphs, clustering to improve perf

  • Cost: Pay per node provisioned

  • ElasticSearch

    • Examnple: In DDB you can only find by primary key or index created on top

    • With ElasticSearch you can *search any field, even partials

    • It's common to use ElasticSearch as a complement to another DB (for website search as example)

    • ElasticSearch also has Big Data application usage

    • You can provision a cluster of instances

    • Built-in integrations for ingestion: Kinesis Firehose, IOT, Cloudwatch logs

    • Security through cognito & IAM, KMC, SSL, VPC

    • Comes with Kibana (visualization) & Logstash (log ingestion) = ELK Stack

  • Operations: Similar to RDS

  • Security: Cognito, IAM, VPC, KMS, SSL

  • Reliability: Multi AZ, clustering

  • Performance: Petabyte scale

  • Cost: Pay per node provisioned

  • = Search / indexing


AWS Monitoring

CloudWatch

  • CloudWatch provides metrics for every service in AWS

  • Metric is a variable to monitor (CPUUtilization, NetworkIn, etc)

  • Metrics belong to namespaces

  • Dimension is an attribute of a metric (instance id, environment, etc)

  • Up to 10 dimensions per metric

  • Metrics have timestamps

  • Can create a CloudWatch dashboard of metrics

Detailed Monitoring

  • EC2 instance metrics have metrics every 5 minutes

  • With detailed monitoring (for a cost) you get data every 1 minute

  • Use detailed monitoring for more effective ASG scaling

  • Free Tier allows up to 10 detailed monitoring metrics

  • EC2 Memory usage is not pushed by default, msut be pushed from inside the instance

CloudWatch Custom Metrics

  • Possibility to define and send your own custom metrics to CloudWatch

  • Ability to use dimensions (attributes) to segment metrics

    • Instance.id

    • Environment.name

  • Metric resolution:

    • Standard: 1 minute

    • High resolution: Down to 1 second (StorageResolution API parameter) - Higher Cost

    • Use API call PutMetricData

    • Use exponential back off in case of throttle errors

  • Available metrics

    • ASGAverageCPUUtilization—Average CPU utilization of the Auto Scaling group.

    • ASGAverageNetworkIn—Average number of bytes received on all network interfaces by the Auto Scaling group.

    • ASGAverageNetworkOut—Average number of bytes sent out on all network interfaces by the Auto Scaling group.

    • ALBRequestCountPerTarget—Number of requests completed per target in an Application Load Balancer target group.

CloudWatch DashBoards

  • Great way to set up dashboards for quick access to key metrics

  • Dashboards are global, go to each region to set up, but see anywhere

  • Dashboards can include graphs from different regions

  • You can change the time zone & time range of the dashboards

  • You can set up automatic refresh (10s, 1m, 2m, 5m, 15m)

  • Pricing:

    • 3 Dashboards (up to 50 metrics) for free

    • $3/dashbaord/month afterwards

CloudWatch Logs

  • Applications can send logs to CloudWatch via the SDK

  • CloudWatch can collect logs from:

    • Elastic Beanstalk: Collects from application

    • ECS: Colelcts from containers

    • Lambda: Collects from functions

    • VPC Flow Logs

    • API Gateway

    • CloudTrail based on filter

    • CloudWatch Logs Agents: For example on EC2 machines

    • Route53: Logs DNS queries

  • CloudWatch logs can go to:

    • Batch exporter to S3 for archival

    • Stream to ElasticSearch cluster for further analytics

Log storage architecture:

  • Log Groups: Arbitary name, usually representing an application

  • Log Stream: instances within application / log files / containers (A log stream is a sequence of log events that share the same source)

  • Can define log expiration policies (never expire, 30 days, etc)

  • Using the CLI we can tail CloudWatch logs

  • To send logs to CloudWatch, make sure IAM permissions are correct!

  • Security: Encryption of logs using KMS at the Group level

CloudWatch Logs Metric Filter & Insights

  • CloudWatch Logs can use filter expressions

    • For example, find a specific IP inside a log

    • Metric filters can be used to trigger alarms (found specific IP, then alarm)

  • CloudWatch Logs Insights can be used to query logs, and add queries to CloudWatch Dashboards (comes withe some default)

CloudWatch Alarms

  • Alarms are used to trigger notifications for any metric

  • Alarms can go to Auto Scaling, EC2 Actions, SNS Notifications

  • Various options (sampling, %, max, min, etc)

  • Alarm States:

    • OK

    • INSUFFICIENT_DATA

    • ALARM

  • Period:

    • Length of time in seconds to evalute the metric

    • High resolution custom metrics: can only choose 10 sec or 30 sec

CloudWatch Events

  • Schedule: Like a cron job (same format)

  • Event Pattern: Event rules to react to a service doing something (Ex: CodePipeline state changes)

  • Triggers to Lambda functions, SQS/SNS/Kinesis Messages

  • CloudWatch Event creates a small JSON document to give info on the change

CloudTrail

  • Provides governance, compliance, and audit for your account

  • Enabled by default

  • Get a history of events / API calls made within your account by:

    • Console

    • SDK

    • CLI

    • AWS Services

  • Can put logs from CloudTrail into CloudWatch logs

  • If a resource is delted, look into CloudTrail first


Security

Encryption in Flight

  • Ensures no MITM

Encryption at Rest

  • Data is encrypted after being received by server

  • Data is decrypted before being sent

  • The encryption / decryption keys (data key) must be managed somewhere and the server must have access to it

Client Side encryption

  • Data is encrypted by client, never decrypted by server

  • Data will be decrypted by a receiving client

  • The server should not be able to decrypt the data

  • Could leverage Envelope Encryption

KMS (Key Management Service)

  • Fully integrated with IAM for authorization

  • Seamlessly integrated into most AWS services (EBS, S3, Redshift, SSM, etc)

  • But you can also use the CLI / SDK

  • Any time you need to share sensitive information, use KMS

    • DB PW

    • Credentials to external sercive

    • Private Key of SSL certs

  • The Customer Master Key (CMK) used to encrypt data can never be retrieved from KMS by the user, and it can be rotated for extra security

  • Never store secrets in plaintext, especially in code

  • Encrypted secret can be stored in code / environment variables

  • KMS can only help in encrypting up to 4KB of data per call: PW, SSL cert, credentials, etc

  • If data > 4KB use envelope encryption

  • To grant KMS access to someone:

    • Make sure the Key Policy allows the user

    • Make sure the IAM Policy allows the API calls

  • KMS makes you able to fully manage the keys & policies: (although we cannot ever see the keys ourselves)

    • Create

    • Rotation policies

    • Disable

    • Enable

  • Able to audit key usage (using CloudTrail)

  • Three types of CMK

    • AWS Managed Service Default CMK: free

    • User Keys created in KMS: $1 / month

    • User Keys imprtoed (must be 256-bit symmetric key): $1 / month

      • pay for API calls to KMS: $0.03 / 10000 calls

68ec868622f5589bbe9ba571f7d2eae7.png

Encryption in AWS Services

  • Requires migration (through Snapshot / Backup)

    • EBS Volumes

    • RDS databases

    • ElastiCache

    • EFS network file system

  • In-place encryption

    • S3

AWS Paramter Store

  • Secure storage for configuration and secrets

  • Optimal Seamless Encryption using KMS

  • Serverless, scalable, durable, easy SDK, free

  • Version tracking of confgiurations / secrets

  • Configuration management using path and IAM

  • Notifications with CloudWatch Events

  • Integration with CloudFormation

  • Simplifies workflow vs KMS

4504084ac9c3e3234b963f97d8cb1c09.png

Parameter Store Hierarchy

  • /my-department/

    • my-app/

      • dev/

        • db-url

        • db-password

      • prod/

        • db-url

        • db-password

    • other-app/

  • /other-dept/

  • Can have encrypted or plaintext parameters

  • In System Manager - Applciation Mgmt, or CLI

  • GetParameters API via Lambda/SDK function or

  • GetParametersByPath API

STS - Security Token Service

  • Allows granting limited and temporary access to AWS resources

  • Token is valid for up to 1 hour (must be refreshed)

  • Cross Account Access

    • Allows users from one AWS account access to resources in another

  • Federation (Active Directory)

    • Provides a non-AWS user with temporary AWS access by linking user's AD credentials

    • Uses SAML

    • Allows Single Sign On (SSO) which enables users to log in to AWS console without assigning IAM credentials

  • Federation with third party providers / Cognito

    • Used mainly in web and mobile apps

    • Makes use of FB/G/Amazon etc to federate them

Cross Account Access

  • Define an IAM Role for another account to access

  • Define which accounts can access this IAM Role

  • Use AWS STS to retrieve credentials and impersonate the IAM Role you have access to (AssumeRole API)

  • Temproary credentials can be valid between 15 minutes to 1 hour

4d8390f94e709be6c0ee5b94dc70bbe1.png

Identity Federation with AD and Cognito

  • Federation lets users outside of AWS assume a temporary role for accessing AWS resources

  • These users assume an identity provided access role

  • Federation assumes a form of 3rd party authentication

    • LDAP

    • MS AD (~=SAML)

    • Single Sign On

    • OpenID

    • Cognito

  • Using federation you don't need to create IAM users (user mgmt is outside AWS)

d56dceb5703eafcf69e8d16482acf1fe.png

SAML Federation (for Enterprise)

  • To integrate AD / ADFS with AWS (or any SAML 2.0)

  • Provides access to AWS Console or CLI (through temporary credentials)

  • No need to create an IAM user for each employee

4ab46ee392d4720a455849b221fbbab4.png

Custom Identity Broker App (for Enterprise) (no SAML 2.0)

  • Use only if the identity provider is not compatible with SAML 2.0

  • You must code your own identity broker which must determine the appropriate IAM policy

e70b7f8937aafeb558e4a34d66216743.png

Cognito - Federated Identity Pools (For Public Applications)

  • Goal:

    • Provide direct access to AWS Resources from the client side

  • How:

    • Log in to federated identity provider (or remain anonymous) (CUP, FB, G, OpenID, SAML, etc)

    • Get temproary AWS credentials back from the Federated Identity Pool (Cognito)

    • They come with a pre-defined IAM policy stating permissions

  • Example:

    • Provide (temprorary) access to write to S3 bucket using FB login

  • Note: Web Identity Federation is an alternative to using Cofnito but AWS recommends against

a78475103b1e73f58790352aaf31897f.png

Shared Responsbility Model

a7e3ee17201a4d53a039540de21eec82.png

VPC

CIDR

  • Two components

    • Base IP (xx.xx.xx.xx)

    • Subnet mask (/32) (defines how many bits can change in an IP)

      • Can take two forms

        • /24

        • 255.255.255.0 (less common)

      • /32 = 1 IP = 2^0

      • /31 = 2 IP = 2^1

      • /30 = 4 IP = 2^2

      • /29 = 8 IP = 2^3

      • /24 = 256 IP = 2^8

      • etc

      • /16 = 65536 = 2^16

      • /0 = all = 2^32

      • /32 - No IP number can change

      • /24 - Last .xIP number can change

      • /16 - Last x.xIP number can change

      • /8 - Last x.x.xIP number can change

      • /0 - All x.x.x.xIP numbers can change

Public vs Private

  • IANA via RFC 1918

  • Private IP can have the following values

    • 10.0.0.0 - 10.255.255.255 (10.0.0.0/8)

    • 172.16.0.0 - 172.31.255.255 (172.16.0.0/12) AWS default

    • 192.168.0.0 - 192.168.255.255 (192.168.0.0/16)

VPC in AWS - IPv4

  • Can have multiple VPCs per region (5 soft limit)

  • Max CIDR per VPC is, following:

    • Min size /28 = 16 IP

    • Max size /16 = 65535 IP

  • Because VPC is private, only RFC1918 addresses

  • VPC CIDR should not overlap with your other networks

Subnets

  • AWS reserves 5 IPs (first 4 and last 1 of range) in each Subnet

  • They are not available for use

  • For CIDR 10.0.0.0/24:

    • 10.0.0.0: Network address

    • 10.0.0.1: Reserved by AWS for the VPC router

    • 10.0.0.2: Reserved by AWS for mapping to Amazon provided DNS

    • 10.0.0.3: Reserved for future use

    • 10.0.0.255: Network broadcast (assume not available for exam)

  • Exam Tip: If you need 29 IP addresses for EC2 you can't choose a /27 because it's only 32 IPs, need a /26 (64IP)

Internet Gateway

  • Helps VPC internet connection

  • Scales horizontally, HA, and redundant

  • Must be created separately from VPC

  • One VPC per IGW, one IGW per VPC

  • IGW is also a NAT for the instances that have a public IPv4

  • Will not have internet access without Route Tables

NAT Instances (outdated)

  • Allow instances in the private subnet to connect to the internet

  • Must be launched in a public subnet

  • Must disable EC2: Source / Destination Check

  • Must have an Elastic IP (because route tables require fixed)

  • Route table must be configured to route trafcic from private subnets to NAT instance

  • Pre-configured Amazon Linux AMI are available

  • Not highly available or resilient setup by default

  • Would need to create an ASG in Multi AZ + resilient user-data script

  • Internet traffic bandwidth depends on EC2 instance performance

  • Must manage security ggroups & rules

    • Inbound

      • Allow HTTP/S from private subnets

      • Allow SSH from hom network (through IGW)

    • Outbound

      • Allow HTTP/S traffic to internet

      • Allow ICMP traffic to internet

NAT Gateway (new)

  • Only IPv4

  • AWS managed NAT, higher bandwidth, better availability, no admin

  • Pay by the hour for usage and bandwidth

  • NAT is created in a specfic AZ, uses EIP (can be in used Public Subnet)

  • Cannot be used by an instance in that subnet (only from other subnets)

  • Requires and IGW (Private subnet -> NAT -> IGW)

  • 5 Gbps of bandwidth with auro-scaling up to 45Gbps

  • No security groups required

* Differences between the two

DNS Resolution in VPC

  • enableDnsSupport: (=Edit DNS Resolution Setting)

    • Default True

    • Decides if DNS resolution is supported for the VPC

    • IfTrue, queries the AWS DNS server at 169.254.169.253

  • enableDnsHostname: (=Edit DNS Hostname setting)

    • False by default for newly created VPC, True by default for Default VPC

    • Won't do anything unless enableDnsSupport=True

    • IfTrue, assign a public hostname to EC2 instances if it has a public IP

  • If you must use custom DNS domain names in a private zone in Route 53, you must have both as TRUE

  • NACL are like a firewall controlling traffic to and from subnet

  • Default NACL allows everything inbound and outbound

  • One NACL per Subnet, new Subnets are assigned the Default NACL

  • Define NACL rules:

    • Rules have a number (1 - 32776) and LOWER number have precedence (once a number is matched it wins and ignores after)

    • Last rule is an asterisk (*), and denies all in case of no match

    • AWS recommends adding rules by increment of 100

  • Newly created NACL will deny everything

  • NACL are great way of blocking a specfic IP at the subnet level

  • Can be associated to multiple subnets

  • Rmemeber ephemeral ports

Inbound

c9d8ee87beb33b3f0ced90abd3b7511a.png
  • SG is Stateful on outbound, will allow out an incoming request return even if outbound rules say not to (SG evaluates all rules before deciding)

  • NACL is Stateless on outbound, all rules are evaluated

Outbound

a84ab894d813b4282b36bb3a2e0ad1a1.png
  • SG is Stateful on inbound, will allow in a returning request even if inbound rules say not to

  • NACL is Stateless on inbound, all rules are evaluated

6edae8ef528ffd134500da3ef66471aa.png

VPC Endpoints

  • Endpoints allow you to connect to AWS services using a private network instead of the public internet

  • They scale horizontally and are redundant

  • They remove the need for IGQ, NAT, etc, to access AWS services

  • Interface: provisions and ENI (private IP) as an entry point (select subnets)(must attach security group) - for most AWS services

  • Gateway: provisions a target and must be used in a route table which is associated with subnets S3 and DynamoDB

    • Needs region specified on the CLI because CLI default is us-east-1 with unspecified

  • In case of issues:

    • Check DNS setting resolution in your VPC

    • Check Route Tables

VPC Peering

  • Connect two VPC privately using AWS' network

  • Make them behave as if they were in the same network

  • Must not have overlapping CIDR

  • VPC Peering connection is not transitive (must be established for each VPC that needs to communicate with another)

  • Can do between accounts and regions

  • You must update route tables in each VPC's subnets to ensure instances can communicate

Flow Logs

  • Capture information about IP traffic going to your interfaces:

    • VPC Flow Logs

    • Subnet Flow Logs

    • Elastic Network Interface (ENI) Flow Logs

  • For ACCEPT and REJECT traffic

  • Helps to monitor & troubleshoot connectivity issues

  • Flow logs data can go into S3 (Athena) / CloudWatch Logs (Insights)

  • Captures network information from AWS managed interfaces too: ELB, RDS, ElastiCache, Redshift, WorkSpaces

Flow Log Syntax

  • [version, accountid, interfaceid, srcaddr, dstaddr, srcport, dstport, protocol, packets, bytes, start, end, action, logstatus]

  • 2 123456789010 eni-1235b8ca123456789 172.31.16.139 172.31.16.21 20641 22 6 20 4249 1418530010 1418530070 ACCEPT OK

  • Query VPC flow logs using Athena on S3 or CloudWatch Logs Insights

Bastion Hosts

  • Used to SSH into private instances

  • In the public subnet which is then connected to all private subnets

  • Bastion Host security must be tight

  • Exam tip: Make sure the bastion host only has port 22 from your ip, not even SG of your other instances

Site to Site VPN, Virtual Private Gateway, Customer Gateway

  • Virtual Private Gateway

    • VPN concentrator on the AWS side of the VPN connection

    • VGW is created and attached to the VPC from which you want to create the site-to-site VPN

    • Possbility to customize the ASN

  • Customer Gateway

    • Software application or physical device on customer side of the VPN connection

    • IP Address

      • Use the static, internet routeable, IP address of your customer gateway device

      • If the CGW is behind a NAT (with NAT-T), use the public address of the NAT

Direct Connect

  • Provides a dedicated private connection from a remote network to your VPC

  • Dedicated connection must be setup between your DC and AWS Direct Connect locations

  • You need to set up a Virtual Private Gateway on your VPC

  • Access public resources (S#) and private (EC2) on the same connection

  • Use cases:

    • Increase bandwidth throughput - working with large data sets - lower cost

    • More consistent network experience - application using real-time data feeds

    • Hybrid Environments

  • Supports both IPv4 and IPv6

b2ddd31533a2b7004550c7d36f225574.png

Direct Connect Gateway

  • If you want to set up a Direct Connect to one or more VPC in many different regions (no overlapping IPs)

0fa45a34dceb67f9dd79efc436f14259.png

Egress only IGW

  • Egress only IGW is for IPv6 only

  • Similar function as a NAT (GW), but a NAT is for IPv4

  • All IPv6 are public addresses

  • Therefore all instances are publicly accessible

  • Egress Only Internet Gatway gives out IPv6 instances access to the internet, but not reachable publicly

  • After creating an Egress Only IGW edit the Route Tables

VPC Summary

dbfa4967ade731a65a24076a6da23f3c.png


Other Services

CI/CD

  • Code - CodeCommit, Build - CodeBuild, Test - CodeBuild, Deploy - Elastic Beanstalk or CodeDeploy -> EC2 Fleet, Provision

  • CodePipeline ORchestrates it all

  • When deploying code directly onto EC2 instances or On Premise servers, CodeDeploy is the service to use. You can define the strategy (how fast the rollout of the new code should be)

Infrastructure as Code

  • CloudFormation - Declarative way of outlining Infrastructure (does ordering and orchestration for you)

    • Manual way: Edit templates in designer, use console to input parameters

    • Automated way: Edit YAML file, use CLI to deploy (recommended)

  • Template Components

    • Resources: Resources declared in template (mandatory)

    • Parameters: The dynamid inputs for your template

    • Mappings: Static variables for template

    • Outputs: References to what has been created

    • Conditionals: List of conditions to perform resource creation

    • Metadata

    • Template Helpers

      • References

      • Functions

ECS

  • Container orchestration service

  • Made of:

    • Core, running ECS on user-provisioned EC2 instances

    • Fargate: serverless

    • EKS: K8s on managed EC2

    • ECR: Registry

  • ECS

    • ECS Cluster: set of EC2 instances

    • ECS Service: Application definitions running on Cluster

    • ECS Tasks + definition: The containers running to create the the application

    • ECS IAM roles: Roles assigned to tasks to interact with AWS

    • ALB integration has direct integration with ECS called port mapping

      • Run multiple instances of the same application on the same machine

        • Increased resiliency even if running on one EC2 instance

        • Maximize CPU/Core utilization

        • Ability ot perform rolling upgrades without impacting application

    • ECS Setup and config file

      • Run an EC2 instance, install the ECS agent with ECS config file

      • Or use ECS ready Linux AMI (and smodify the config file)

      • Config file is at: /etc/ecs/ecs.config

c42e568af7e1cc91cf5d491031d5c7f2.png
  • ECR Registry

    • Store, manage, deploy your containers

    • Fully integrated with IAM & ECS

    • Sent over HTTPS, and encrypted at rest

Step Functions

  • Build Serverless visual workflow to orchestrate your Lambda functions

  • Represent flow as a JSON state machine, outputs a visual workflow graph, can see steps succeed / in progress / fail etc

  • Features: sequence, parallel, conditions, timeouts, error handling...

  • Maximum execution time of 1 year

  • Can implement human approval feature

  • Use cases: Order fulfillment, data processing, etc

SWF - Simple Workflow Service (older)

  • Coordinate work amongst applications (not serverless)

  • **Step Functions is recommended for all new apps, except:

    • If you need external signals to intervene in the process

    • If you need child processes that return values to parent process.**

AWS Glue

  • Fully managed ETL service

  • Move from data sources, transform, clean, change format and put somewhere

  • Automate time consuming steps of data preparation for analytics

  • Provisions Apache Spark

  • Crawls data sources and identifies data formats (schema inference)

  • Automated Code Generation to customize Spark code

  • Sources: Aurora, RDS, Redshift, & S3 (crawls tables etc and discovers all)

  • Sinks: S3, Redshift, etc

  • Glue Data Catalog: Metadata (definition & schema) of the Source Tables (to later use in your EMR)

Opsworks

  • Opsworks = Managed Chef & Puppet

  • Alternative to AWS SSM

  • Configuration as code

Elastic Transcoder

  • Convert media files (video & music) stored in S3 to various formats

  • Features: bit rate optmization, thumbnail, watermarks, captions, DRM, rpgoressive download, encryption

  • Components:

    • Jobs: what does the actual work

    • Pipeline: Queue that manages the transcoding job

    • Presets: Template for converting media from one format to another

    • NOtifications: SNS for example

  • Pay for waht you use, fully managed

AWS Organizations

  • Global service

  • One master account - can't change it

  • Other accounts are member accounts, which can only be part of one org

  • Consolidated billing across all accounts

  • Pricing benefits from aggregated usage

  • API is available to automate account creation

  • Organize accounts in Organizational Units (OU)

    • Can be anything dev, test, prod, or hr, finance, IT

    • Can nest OU within OU

  • Apply Service Control Policies (SCPs) to OU

    • Permit / Deny access to AWS services

    • SCP has a similar syntax to IAM

    • It's a filter to IAM

  • Helpful for sandbox account creation

  • Helpful to separate dev and prod resources

  • Helpful to only allow approved services

3f46a4257c5eb54fdcec149a1977c597.png

AWS WorkSpaces

  • On demand Managed, Secure Cloud Desktop

  • Elimite on-prem VDI

  • Secure, encrypted, network isolation

  • Integrates with AD

  • Windows and Linux

AppSync

  • Store and sync data across mobile and web-apps in real-time

  • MAkes use of GraphQL (from facebook)

  • Integrates with DynamoDB / Lambda

  • Offline data synchronization (alternative to Cognito, exam)

AWS Single Sign On

  • Centrally managed SSO across multiple AWS account, Business Applciations (O365, Salesforce, Box, etc)

  • One login gets you access to everything securely

  • Integrated with MS AD

  • Reduces process of setting up SSO in a company

  • Only helpful for Web Browser, SAML 2.0 enabled applications

Here's a quick cheat-sheet to remember all these services:

CodeCommit: service where you can store your code. Similar service is GitHub

CodeBuild: build and testing service in your CICD pipelines

CodeDeploy: deploy the packaged code onto EC2 and AWS Lambda

CodePipeline: orchestrate the actions of your CICD pipelines (build stages, manual approvals, many deploys, etc)

CloudFormation: Infrastructure as Code for AWS. Declarative way to manage, create and update resources.

ECS (Elastic Container Service): Docker container management system on AWS. Helps with creating micro-services.

ECR (Elastic Container Registry): Docker images repository on AWS. Docker Images can be pushed and pulled from there

Step Functions: Orchestrate / Coordinate Lambda functions and ECS containers into a workflow

SWF (Simple Workflow Service): Old way of orchestrating a big workflow.

EMR (Elastic Map Reduce): Big Data / Hadoop / Spark clusters on AWS, deployed on EC2 for you

Glue: ETL (Extract Transform Load) service on AWS

OpsWorks: managed Chef & Puppet on AWS

ElasticTranscoder: managed media (video, music) converter service into various optimized formats

Organizations: hierarchy and centralized management of multiple AWS accounts

Workspaces: Virtual Desktop on Demand in the Cloud. Replaces traditional on-premise VDI infrastructure

AppSync: GraphQL as a service on AWS

SSO (Single Sign On): One login managed by AWS to log in to various business SAML 2.0-compatible applications (office 365 etc)


Whitepapers

Well Architected Framework + Tool

  • General Guiding Principles

    • Stop guessing capacity needs

    • Test systems at production scale

    • Automate to make architectural experimentation easier

    • Allow for evolutionary architectures

      • Design based on changing requirements

    • Drive architecture changes using data

    • Improve through game days

      • Simulate applications for flash sale days

5 Pillars

  • Operational Excellence

    • The ability to run and monitor systems to deliver business value and to continually improve supporting processes and procedures

    • Design Principles

      • Perform annotations as code - Infrastructure as code

      • Annnotate Documentation - Automate the creation of annotated documentation after every build

      • Make frequent, small, reversible changes

      • Refine operations procedures frequently - And ensure that team members are familiar with it

      • Aniticpate failure

      • Learn from all operation failures

    • Prepare

      • CloudFormation, AWS Config

    • Operate

      • CloudFormation, AWS Config, CloudTrail, CloudWatch, X-Ray

    • Evolve

      • CloudFormation, CodeBuild, CodeCommit, CodeDeploy, CodePipeline

  • Security

    • Includes the ability to protect information, systems, and assets while delivering business value through risk assessments and mitigation strategies

    • Design Principles

    • Implement a strong identity foundation - Centralize privilege management and reduce (or even eliminate) reliance on long term credentials - Principle of Least Privilege - IAM

    • Enable traceability - Integrate logs and metrics with systems to automtaically respond and take action

    • Apply security at all layers - Edge Network, VPC, Subnet, Load balancer, each instance, OS, and application

    • Automate Security best practices

    • Protect data in tansit and at rest - Encryption, tokenization, and access control

    • Keep people away from data - No direct or manual access

    • *Prepare for security events - Run incident response, simulations and use tools with automation to increase your speed for detection, investigation, and recovery

    • IAM

      • IAM, AWS-STS, MFA token, Organizations

    • Detective Controls

      • Config, CloudTrail, CloudWatch

    • Infrastructure Protection

      • CloudFront, VPC, Shield, WAF, Inspector

    • Data Protection

      • KMS, S3, ELB, EBS, RDS

    • Incident Response

      • IAM, CloudFormation, CloudWatch Events

  • Reliability

    • Ability of a system to recover from infrastructure or service disruptions, dynamically acquire computing resources to meet demand, and mitigate disruptions suchj as misconfigurations or transient network issues

    • Design Principles

      • Test recovery scenarios - Use automation to simulate different failures or to recreate scenarios that led to failures before

      • Automatically recover from failure - Anticipate and remediate failure before they occur

      • Scale horizontally to increase aggregate system availability - Distribute requests across multiple, smaller resources to ensure that they don't share a common point of failure

      • Stop guessing capacity - Maintain the optimal level to satisfy demand without over or under provisioning

      • Manage change via automation

    • Foundations

      • IAM, VPC, Service Limits, Trusted Advisor

    • Change management

      • Autoscaling, CloudWatch, CloudTrail, Config

    • Failure Management

      • Backups, CloudFormation, S3, S3 Glacier, Route 53

  • Performance Efficiency

    • Includes the ability to use computing resources efficiently to meet system requirements and to maintain that efficiency as demand change and techonologies evolve

    • Design Principles

      • Democratize advanced technologies - Advanced technologies become services and hence you can focus more on product development

      • Go global in minutes - Easy deployment in multiple regions

      • Use serverless archtiectures - Avoid burden of managing servers

      • Experiment more often - Easy to carry out comparative testing

      • Mechanical sympathy - Be aware of all AWS services

    • Selection

      • Auto-Scaling, Lambda, EBS, S3, RDS

    • Review

      • CloudFormation

    • Monitoring

      • CloudWatch, Lambda

    • Tradeoffs

      • RDS, Elasticache, Snowball, Cloudfront (all have tradeoffs vs other solutions)

  • Cost Optimization

    • includes the ability to run systems to deliver business alue at lowest price point

    • Design Principles

      • Adopt a consumption model - Pay only for what you uuse

      • Measure overall efficiency - Use CloudWatch

      • Stop spending money on data center operations - AWS does the infrastrcuture part and enables customer to focus on organization projects

      • Analyze and attribute expenditure - Accurate identification of system usage and costs, helps measure return on investment. USE TAGS

      • Use managed and application level services to reduce cost of ownership - As a managed services operate at cloud scal, they can offer a lower cost per transacation or service

    • Expenditure Awareness

      • Budgets, Cost and Usage reports, Cost Explorer, Reserved Instance Reporting

    • Cost-effective resources

      • Spot instance, Reserved instances, Glacier

    • Matching supply and demand

      • Auto-Scaling, Lambda

    • Optimizing Over Time

      • Trusted Advisor, Cost and usage reports

  • Not tradeoffs, they're a synergy

Well Architected Tool

  • Define workload, track over time

  • Milestones, improvement plans, Risks

Trusted Advisor

  • Cost optimization, Performance, Security, Fault Tolerance, Service Limits

  • Get upgraded recommendations, more than for governance

  • Some paid

  • Can get weekly emails to different contact groups

Disaster Recovery

  • Any event that has a negative impact on a company's business continuity or finances is a disaster

  • DR is about preparing for and recovering from a disaster

  • What kind of DR?

    • On-Premisea -> On-preimse (traditional, $$$$)

    • On-Premises -> AWS Cloud (hybrid recovery)

    • AWS Cloud Region A -> AWS Cloud Region B

Strategies

  • Backup and restore (Longest RTO, high RPO, not too expensive)

  • Pilot Light (2nd longest RTO, Small version of the app is always running in the cloud, similar to backup restore but critical core up )

  • Warm Standby (3rd longest RTO, full system up and running but at minimum size, scale to production load)

  • Multi-Site (Shortest RTO, full prod at second site)

  • But all get increasingly more expensive

Last updated

Was this helpful?