SysOps Associate - Study Notes

Start stop pending states diagram etc

EC2 For Sysops

  • EC2 Changing Instance Type

    • This only works for EBS backed instances

      • Stop the instance

      • Instance settings -> Change Isntance Type

      • Start Instance

    • Some instance on change can switch to EBS-optimized, smaller instances cannot

EC2 Placement Groups

  • Cluster - clusters instances into a low-latency group in a single AZ

    • Not available for T2 and other small, but most of larger instance sizes

  • Spread - spreads instances across underlying hardware (max 7 instances per group per AZ)

  • Partition - spreads instances across many different partitions (which rely on different sets of racks) within an AZ. Scales to 100s of EC2 instances per group (Hadoop, Cassandra, Kafka)

    • Up to 7 paritions per AZ

    • Instances in partition do not share racks with instances in the other partitions

    • EC2 instances get info to partition information as metadata

    • Can specify the partition, or go for auto-spread

EC2 Shutdown Behavior & Termination Protection

  • Shutdown behaviour (when does using the OS itself)

    • Stopped: default

    • Terminated

    • Not applicable when done from AWS console or AWS API

    • CLI Attribute: InstanceInititatedShutdownBehaviour

  • Termination Protection

    • Enable termination protection: To protect against accidental termination in AWS Console or CLI

    • Even if shutdown behaviour = Terminate, and termination protection is enabled, when shutting down instance from inside OS the instance will be terminated

Troubleshooting EC2 Launch Issues

  • InstanceLimitExceeded error: reached max number of instances per region (check service limits)

    • Resolution: Launch in different region or request region limit increase

  • aws ec2 run-instances (now max is 32 vCPUs not 20 instances)

  • InsufficientInstanceCapacity: It means AWS does not have sufficient On-Demand capacity in the AZ

    • Wait, or try less instances, or different instance type and upgrade later

  • If Instance Terminates Immediately (goes from pending to terminated)

    • Reached EBS volume limit

    • EBS snapshot is corrupt

    • The root EBS volume is encrypted and you do not have permissions to access the KMS key for decryption

    • The instance store backed AMI that you used to launch the instance is missing a required part (an image.part.xx file)

    • To find exact reason, check Instance - Description tab, next to State Transition reason label (must add add instance attribute column)

Troubleshooting EC2 SSH Issues

  • chmod 400

  • Make sure the right username for the OS is used, or get "Host key not found" error

  • Connection timeout

    • SG not properly configured

    • CPU load of instance is too high

EC2 Instance Launch Modes

  • Dedicated Hosts

    • Full control of EC2 instance placement

    • Visibility into underlying sockets / physical cores of the hardware

    • Allocated to you account for a 3 year period reservation

8352a8632ae43cb0b68c6c8e99e0b757.png

EC2 Instance Types Deep Dive

  • https://www.ec2instances.info

Cross Account AMI Copy

  • Why use a custom AMI

    • Pre-installed packages were needed

    • Faster boot time (no need for ec2 user data)

    • Machine comes configured with monitoring / enterprise software

    • Security concerns - control over the machines in the network

    • Control the maintenance and updates of AMIs over time

    • Active Directory Integration out of the box

    • Installing your app ahead of time (for faster deploys in auto-scaling)

    • Using someone else's AMI that is optimized for a specific app, DB, etc

  • Using public AMIs

    • Can pay for other's AMIs by the hour

      • Could have optimized the software

      • Easy to run and configure

      • Essentially "rent" expertise from the AMI creator

    • They can be found and published on the Amazon Marketplace

AMi Storage

  • They live in S3, and are charged accordingly (but are not visible)

  • By default AMIs are private, and locked for your account / region

  • You can make AMIs public and share them with other AWS accounts or sell them on the Marketplace

Cross account AMI Copy

  • Can share AMIs with another account

  • Sharing does not affect the ownership of the AMI

  • If you copy an AMI that has been shared with your account, you are the owner of the target AMI in your account

  • To copy, the owner of the source AMI must grant you read permissions for the storage that backs the AMI (either the associated EBS snapshot for an Amazon EBS-backed AMI, or and associated S3 bucket for an instance store-backed AMI)

  • You can't copy an encrypted AMI that was shared with you from another account. If the underlying snapshot and encryption key were shared with you, you can copy while re-encrypting with your own key, then register as a new AMI.

  • AMIs are based on Amazon Elastic Block Store (Amazon EBS) snapshots. For large file systems without a previous snapshot, creating an AMI can take several hours. To decrease the AMI creation time, first create an Amazon EBS snapshot before creating the AMI.

Elastic IPs

  • Elastic ip is a public IPv4 you own as long as you don't delete it

  • You can attach it to one instance at a time

  • You can remap it across instances

  • You don't pay for it if it's attached to a server

  • You pay for it if it's not attached to a server

  • Use to mask the failure of an instance or software by rapidly remapping the address to another instance in your account

  • You can only have 5 Elastic IP (increasable)

  • Overall try to avoid using

    • You could use a random public IP and register a DNS name to it

    • Or use a load balancer with a static hostname

CloudWatch

Cloudwatch metrics for EC2 (exam)

  • AWS Provider metrics (AWS pushes)

    • Basic monitoring (default): 5 minutes interval

    • Detailed monitoring (paid): 1 minute interval

    • Include CPU, Network, Disk, Status Check metrics

      • CPU: CPU utilization + T2/3 Credit usage / balance

      • Network: Network In / Out

      • Status Checks

        • Instance status: the EC2 VM (0 or 1 status)

        • System status: the underlying hardware (0 or 1 status)

      • Disk: Read / Write for Operations / Bytes (only for instance store)

      • RAM not included by default

  • Custom Metrics (you push)

    • Basic resolution: 1 minute resolution

    • High resolution: Down to 1 second resolution

    • Include RAM, application level metrics

    • Make sure the IAM permissions on the EC2 instance role are correct

      • Can now attach Roles while instance is running, but still only 1 role per instance

      • cloudwatch.PutMetricData

      • cloudwatch.GetMetricStatistics

      • cloudwatch.ListMetrics

      • ec2.DescribeTags

Custom Metrics

  • Sample custom metrics

    • RAM (cloudwatch monitoring scripts available)

    • Swap usage

    • App metrics (request per sec, etc)

Cloudwatch Logs for EC2 instances

  • By default no logs from your EC2 instances will go to CloudWatch

  • You need to run the CloudWatch agent on the EC2 server to push the log files you want

  • Make sure the IAM permissions are correct

    • Configure Role

  • The CloudWatch log agent can be set up on-premises too

  • There is a new unified CloudWatch agent (check it out)

  • /etc/awslogs/awscli.conf

  • sudo start awslogsd

EC2 at Scale

  • Systems Manager (GO PLAY)

    • Manage EC2 and on-premises at scale

    • Get operational insights about the state of infrastructure

    • Easily detect problems

    • Patching automation for enchanced compliance

    • Works for Windows and Linux

    • Integrated with CloudWatch metrics / dashboards

    • Integrated with AWS Config

    • Free service

    • Remember for exam:

      • Parameter Store

      • Run Command

      • Patch Manager

    • Must install SSM agent onto systems we wish to control

    • Installed by default with Linux AMI and some Ubuntu AMI

    • If the instance can't be controlled via SSM it's probably an issue with the agent

    • Make sure the EC2 instances have a proper IAM role to allow SSM actions (talk to SSM)

aa5047f31c175797d48be986c3e5ffb2.png

AWS Tags

  • Free naming, common tags are: Name, Environment, Team, Layer, etc

  • Used for:

    • Resource grouping

    • Automation

    • Cost allocation

System Manager Resource Groups

  • Create, view, or manage logical groups of resources with tags

  • Allows creation of logical groups of resources such as

    • Applications

    • Different layers of an application stack

    • Production versus dev environments

  • Regional Service

  • Works with EC2, S3, DynamoDB, Lambda, etc

SSM Documents

  • Documents can be in JSON or YAML

    • Command

    • Policy

    • Automation

  • You define parameters

  • You define actions

  • Many documents already exist in AWS

  • They can act on State Manager, Patch Mgr, Automation, Run Command, and reference Parameter Store

  • Can execute them easily through Automation menu

SSM Run Command

  • Execute a document (script) or just run a command

  • Run command across multiple instances (using resource groups)

  • Rate control / Error control

  • Integrated with IAM & CloudTrail

  • No need for SSH

  • Results in the console

Using SSM to PATCH

  • Inventory -> List software on instance

  • Inventory + Run Command -> Patch Software

  • Patch Manager + Maintenance Window -> Patch OS

  • Patch Manager -> Gives you compliance

  • State Manager -> Ensures instances are in a cinsistent state (compliance)

SSM Session Manager

  • Allows you to start a secure shell on your VM

  • Does not use SSH access and bastion hosts

  • Only EC2 for now, but On-prem eventually

  • Log actions done through secure shells to S3 and CloudWatch Logs

  • IAM permissions: access SSM + write to S3 + write to CloudWatch

  • CloudTrail can intercept StartSession events

  • ssm-user not ec2-user

  • AWS secure shell vs. SSH

    • No need to open port 22 at all

    • No need for bastion hosts

    • All commands are logged to S3 / CloudWatch (auditing)

    • Access done through User IAM not SSH keys

Lost SSH Key

  • Traditional Method for EBS backed

    • Stop, detach root voluem, attach to another instance

    • modify the ~/.ssh/authoized_keys to append your new key, reattach

  • New Method for EBS backed

  • Run the AWSSupport-ResetAccess autmoation document in SSM

  • Instance store backed EC2

    • You can't stop instance, or data is lost. AWS recommends just terminating and creating a new

    • Pro-tip: Use Session Manager to secure shell to access and edit the ~/.ssh/authorized_keys file directly

Parameter Store

  • Secure storage for configuration and secrets

  • Optional seamless encryption using KMS

  • Serverless, scalable, durable, easy SDK, free

  • Version tracking of configurations / secrets

  • Configuration management using path & IAM

  • Notifications with CloudWatchEvents

  • Integration with Cloud formation

  • In a tree hierarchy

    • Plaintext or Encrypted, uses KMS to unencrypt

    • GetParameters or GetParametersByPath API

    • aws ssm get-parameters --names xxxx

Load Balancing

  • Any LB (CLB, ALB, NLB) has a static hostname, use it and not underlying IP.

  • LB can scale, but not instantaneously, contact AWS for a "warm-up"

  • NLB directly see the client IP

  • 4xx are client induced errors

  • 5xx are application induced errors

    • Error 503 means at capacity or no registered target

  • ALB does not support statIP

  • NLB gets one 1 static IP per subnet (**to get a static IP for ALB chain it behind a NLB)

  • NLB does not need pre-warming

  • NLB doesn't do SSL termination (except it does now)

  • Error Codes

    • Unsuccessful at client side: 4xx

      • Error 400: Bad request

      • Error 401: Unauthorized

      • Error 403: Forbidden

      • Error 460: Client Closed connection

      • Error 463: X-forwarded For header had more than 30 IPs (simialr to malformed request)

    • Unsuccessful at server side: 5xx

      • Error 500: Internal server error on ELB

      • Error 502: Bad Gateway

      • Error 503: Service unavailable

      • Error 504: Gateway timeout

      • Error : Unauthorized

  • Supporting SSL for Old Browsers (such as TLS 1.0)

    • Change the policy to allow a weaker cipher

      • ELBSecurityPolicy-TLS-1-0-2015-04, there are others, note this one

  • Enable Deletion Protection

Load Balancers Monitoring

  • All LB metrics are directly pushed to CloudWatch metrics

    • BackendConnections Errors

    • Healthy/UnhealthyHostCount

    • HTTPCode_Backend_2xx: successful count, 3xx redirected count, 4xx client erroror codes, 5xx server error codes generated by LB

    • Latency

    • RequestCount

    • SurgeQueueLength the total number of requests that are pending routing to a healthy instance, max value 1024.

    • SpilloverCount the total number of requests that were rejected because the surge queue is full

Load Balancers Access Logs

  • Access logs for LB can be enabled in attributes and stored in S3, they contain:

    • Time

    • Client IP

    • Latencies

    • Request paths

    • Server response

    • Trace Id

  • Only pay for S3 storage

  • Helpful for compliance

  • Helpful for keeping access data even after ELB or EC2 instances are terminated

  • Access logs are already encrypted

Application LB Request Tracing

  • Request tracing - Each HTTP request has added a custom header X-Amzn-Trace-Id

  • Example: X-Amzn-Trace-Id: Root=1-74628i123-asdberwer01234568123123

  • Useful in logs / distrbuted tracing platform to track a single request

  • Not yet integrated with X-Ray

AutoScaling and Group

  • Exam question: ASG is healthy, but EC2 instance behind ALB is not, change the ASG health check type from EC2/Instance to ELB so it picks up on its health checks

  • Look into CLI:

    • set-instance-health in asg (to run tests)

    • terminate-instance-in-autoscaling-group

  • Health Checks:

    • EC2 Status checks

    • ELB Health checks

  • Will not reboot unhealthy instances

Scaling Processes in ASG

  • Launch: Add a new EC2 to the group

  • Terminate: Remove an EC2 from the group

  • HealthCheck: Checks the health of instances

  • ReplaceUnhealthy: Terminate the unhealthy instances and recreate

  • AZRebalance: Balance the number of EC2 instances across AZs

    • launch new instances and then terminate old

    • If Terminate is suspended will grow up to 10% of its size, but could remain there because it can't terminate old

  • AlarmNotification: Accept notification from CloudWatch

  • ScheduledActions: Performs scheduled actions that you create

  • AddToLoadBalancer: Adds instances to the load balancer or target group

  • We can suspend these processes so that they cannot be used

Troubleshooting ASG

  • instances are already running. Launching EC2 instance failed.

    • ASG has reached DesiredCapacity parameter limit, update it.

  • Launching EC2 instances is failing:

    • The SG does not exist, may have been deleted.

    • The key pair does not exist, may have been deleted.

  • If the ASG fails to launch an instance for over 24h, it will automatically suspend all the proccesses (administration suspension)

CloudWatch for ASG

  • Available for ASG (opt-in)

    • GroupMinSize

    • GroupMaxSize

    • GroupDesiredCapacity

    • GroupInServiceInstances

    • GroupPendingInstances

    • GroupStandbyInstances

    • GroupTerminatingInstances

    • GroupTotalInstances

  • You must enable metric collection to see these metrics

  • Metrics are collected each 1 minute

Monitoring the underlying EC2 via ASG

  • Basic monitoring: 5 minutes granularity

  • Detailed Monitoring: 1 minute granularity (paid)

Elastasic BeanStalk

  • BeanStalk is free, only pay for underlying instances

  • Managed Service

    • Instance config / OS is handled by Beanstalk

    • Deploymeny strategy is configurable but performed by BeanStalk

  • Only resonsible for code

  • Three Archtiecture models

    • Single Instance: good for Dev

    • LB + ASG: Great for production or pre-prod web applications

    • ASG only: great for non-web in production (workers etc)

  • Has three components

  • Application

  • Application version: each deployment get assigned a version

  • Environment name: Dev, test, prod, free naming

  • Deploy application versions to environments and can promote application versions to next environment

  • Rollback feature to pervious version

  • Full control over lifecycle of environments

Deployment Options for Updates

  • All at once (deploy all at one go): Fastest but instances aren't available to serve traffic for the downtime

  • Rolling: Update a few instances at a time (bucket), and then move on to the next bucket once the first bucket is healthy

  • Rolling with additional batches: Like rolling but spins up new instances to move the batch (so always at max capacity)

  • Immutable: spins up new instances in a new ASG, deploys version to them, and then swaps all the instances when everything is healthy. Highest cost, quick rollback.

Blue/Green Deployment

  • Create a new "stage" env, deploy v2 there

  • New env (green) can be fully validated and roll back if issues

  • Route 53 can be set up using weighted policies to redirect traffic bit by bit to the new env

  • Using Beanstalk use "swap URLs" when done with env test

2d6362ee71d6de283fbc7d29ab1b1491.png

Beanstalk for SysOps

  • Beanstalk can put application logs directly into CloudWatch Logs

  • Can use custom domain: Route 53 Alias or CNAME on top of Beanstalk URL

  • Not responsible for patching the runtimes

  • On update of app resolving dependencies can take a long time, use Golden AMI (especially in combo with B/G for speed)

    • Package OS dependencies

    • Package OS depenencies

    • Package company-wide software

Troubleshooting Beanstalk

  • If the health of your environment changes to red:

    • Review environment events

    • Pull logs to view recent log file entries

    • Roll back to a previous, working version of the app

  • When accessing external resources, make sure the security groups are correctly configured

  • In case of command timeouts you can increase the deployment timeout value

CloudFormation

Update

  • Add, Modify Actions

  • Replacement = True (or not)

Mappings

  • Great when you know in advance all the values that can be taken, and they can be deduced from varibale such as Region, AZ, Account, Env (dev vs prod), etc

  • Allow safer control over the template

  • Use parameters when the values are really user specific

  • Use Fn::FindInMap to return a named value from a specific key

  • !FindInMap [MapName, TopLevelKey, SecondLevelKey]

Outputs

  • Best way to perform some cross stack collaboration, let each expert handle their own part of stack

    • Fn::ImportValue the exported value (must have certain level of uniqueness)

  • You can't delete a CloudFormation Stack if its outputs are being referenced by another CloudFormation stack

Conditions (!Equals [ !Ref EnvType, prod])

  • Fn::And

  • Fn::Equals

  • Fn::If

  • Fn::Not

  • Fn::Or

Intrinsic Functions

  • Fn::Ref = !Ref

    • Parameters -> returns the value of the parameter

    • Resources -> returns the physcail ID of the underlying resource

  • Fn::GetAtt = !GetAtt

    • Attributes can be attached to any resource you create (see docs)

  • Fn::FindInMap = !FindMap

    • !FindInMap [ MapName, TopLevelKey, SecondLevelKey]

  • Fn::ImportValue = !ImportValue

    • Import values that have been exported from toher templates

  • Fn::Join

    • Join values with a delimiter

    • !Join [ delimiter, [ comma-delimited list of values ] ]

    • A🅱️c = !Join [ ":", [ a, b, c ] ]

  • Fn::Sub = !Sub

    • Subsitute variables in a text, can combine with References or pseudovariables. Must contain ${VariableName}

    • !Sub

    • -- String

    • -- {var1name: var1value, var2name: var2value }

  • Condition Functions (if not equals or and)

Cloudformation for SysOps

User Data in EC2

  • We can have user data at EC2 instance launch in Cloudformation

  • The important thing is to pass the entire script through the function Fn::Base64

  • Use pipe before script so all is treated as one, with linebreaks

  • User data script log is in /var/log/cloud-init-output.log

cfn-init

  • Alternate way to do User Data instance stuff

  • AWS::CloudFormation::Init must be in Metadata of a resource, defines in metadata what and how to install

  • With the cfn-init script, it helps make complex EC2 configurations readable

  • The EC2 instance will query the CloudFormation service to get the init data

  • Logs go to /var/log/cfn-init.log

cfn-signal & wait conditions

  • Still can't tell CloudFormation EC2 was proerply configured after a cfn-init

  • For this we us a cfn-signal script

    • We run cfn-signal right after cfn-init

    • Tell CloudFormation service to keep on going or fail

  • We need to define a WaitCondition in resource (polled by cfn-signal) (AWS:CloudFormation::WaitCondition)

    • Block the template until it receives a signal from cfn-signal

    • We attach a CreationPolicy (works on EC2 and ASG)

Wait condition didn't receive the required number of signals from EC2 instance

  • Ensure the AMI has CloudFormation helper scripts (can DL)

  • Verify that cfn-init & cfn-signal command ran successfully, view logs /var/log/cloud-init.log or /var/log/cfn-init.log

  • Can retrieve logs by logging onto instance, but must disable rollback onf ailure or else instance is deleted

  • Verify instance has a connection to internet (public IGW or NAT) otherwise can't connect to CloudFormation

    • Can test with curl -l http://aws.amazon.com

Rollback on failures

  • Stack Creation fails: (CreateStack API) - Stack Creation Options

    • Default: everything rolls back (gets deleted)

      • OnFailure=ROLLBACK

    • Troubleshoot: Option to disable rollback to manually troubleshoot

      • OnFailure=DO_NOTHING

    • Delete: Get rid of stack entirely, don't keep anything

      • OnFailure=DELETE

  • Stack Update Failes: (UpdateStack API)

    • The stack automatically rolls back to the previous known working state

    • Ability to in logs what happened

Nested Stacks

  • Nested stacks are stacks as part of other stacks

  • They allow you to isolate repeated patterns / common components in separate stacks and call them from other stacks

  • Considered best practice

  • To update a nested stack always update the parent (root stack)

  • Resource -> Type -> AWS::CloudFormation::Stack, TemplateURL

ChangeSets

  • When you updatae a stack you need to know what changes before it happens for greater confidence

  • ChangeSets won't say if the update will be successful though

  • Create Change set -> View Change set -> (optional) Create additional change sets -> Execute Change set

  • See changesets in Stacks menu on left or

  • Actions on stack create changeset

Retaining Data on Deletes

  • You can put a DeletionPolicy on any resource to control what happens when the CloudFormation template is deleted (in resource definition)

  • DeletionPolicy=Retain

    • Specify on resources to preserve/backup in case of CloudFormation deletes

    • To keep a resource, specify Retain (works for any resource/nested stack)

  • DeletionPolicy=Snapshot

    • Will take a snapshot before deleting resource

    • EBS Volume, ElastCache/Cluster, ReplicationGroup

    • RDS DBInstance, RDS DBCluster, Redshift Cluster

  • DeletionPolicy=Delete (default)

    • Note: for AWS::RDS::DBCluster resources the default is snapshot

    • Note: to delete an S3 bucket you need to first empty the bucket

EFS & EBS

EBS Volume

  • EC2 loses root volume when manually terminated

  • Unexpected terminations might happen (AWS alerts via email)

  • EBS volume is a network drive you can attach to your instances while they run, to persist data

  • Can be latency due to network, can be detached and attached quickly

  • Provisioned capacity GBs and IOPs

    • Billed for all provisioned capacity

    • Can increase drive over time, start small

  • Characterized in Size | Throughput | IOPS

  • Only GP2 and IO1 can be boot volumes

  • lsblk

EBS Volumes Types

  • GP2 (SSD): General purpose SSD (balance price/perf)

    • Boot volumes, virtual desktops, low-latency interactive apps, development and test

    • 1GB-16TB

    • Small GP2 can burst IOPS to 3000 (anything under 3k can burst to 3k)

    • Max IOPS is 16000

    • 3 IOPS per GB, means at 5334 GB at max IOPS

  • IO1 (SSD): Highest-perf, low latency or high-throughput

    • Critical business apps that require sustained IOPS, or more than 16000

    • Mongo, Cassandra, MSSQL, MySQL, Oracle

    • 4GB-16TG

    • IOPS is provisioned 100-64000 (64k for Nitro only) else 100-32000

    • Maximum ratio of provisioned IOPS to volume GB size = 50:1

  • ST1 (HDD): Low cost for frequently accessed, throughput-intensive workloads (big data)

    • Streaming workloads requiring consistent, fast throughput at low price

    • Big Data, DW, log processing, Kafka

    • Cannot be boot volume

    • 500GB - 16TB

    • Max IOPS is 500

    • Max throughput of 500 MB/s, can burst

  • SCI (HDD): Lowest cost for less frequently accessed workloads

    • Throughput oriented for large volumes of data infrequently accessed

    • Where lowest cost is important

    • Cannot be a boot volume

    • 500Gb - 16TG

    • Max IOPS is 250

    • Max throughput of 250 MB/s, can burst

GP2 volumes I/O Burst

  • IF your gp2 volume is less than 1000GB (IOPS less than 3000) it can burst to 3000 (no burst over 1000GB)

  • Accumulate burst credit over time

  • Bigger your volume, faster you fill up your "burst credit balance"

  • What happens if I/O credit is empty?

    • The max I/O becomes the baseline you paid for

    • If you see balance at 0 all the time you should increase your volume size or switch tio IO1

    • Use CloudWatch to monitor the I/O credit balance

  • Burst also applies to ST1 or SC1 (for increase in throughput)

Computing MB/s based on IOPS

  • gp2

    • Throughput in MB/s = (Volume size in GB) * (IOPS per GB) * I/O size in KB)

    • 100GB * 3 IOPS * 256KB per I/O operation = 75MB/s

    • Limit to a max of 250MB/s (means volume >= 334GB won't increase throughput)

  • IO1

    • **Throughput in MB/s = (Provisioned IOPS) * (I/O size in KB)

    • 1000 IOPS * 256KB = 250MB/s

    • Throughput limit of IO1 is 256KB for each IOPS provisioned

    • Limit to a max of 500MB/s (at 32k IOPS) and 1000MB/s (at 64k IOPS)

EBS Volume Resizing

  • Can do on the fly (no stop of instance)

  • Can only increase volume size (any volume type)

  • Can change volume type

  • Can increase IO1 IOPS

  • After resizing need to repartition your drive

  • After increasing the size the volume will be in "optimisation" phase for a while, but less perf (in-use - modifying/optimizing)

EBS Snapshots

  • Incremental - only changed blocks

  • EBS backups use IO, should not run them during peak times

  • Snapshots are stored in S3 (but you won't see them)

  • Don't have to detach volume but recommended

  • Max 100000 snapshots

  • Can copy across AZ or Region

  • Can make AMI from Snapshot

  • EBS volumes restored by snapshots need to be pre-warmed (using fio or dd to read entire volume)

  • Can be automated using Amazon Data Lifecycle Manager

EBS Migration

  • Volumes locked to AZ

  • To migrate, snapshot, (optional) copy volume to different region

  • Create a volume from the snapshot in the AZ of your choice

EBS Encryption

  • When you encrypt an EBS volume you get:

    • Data at rest is encrypted inside the volume

    • Data in flight between instance and the volume is encrypted

    • Snapshots are encrypted

    • As are volumes created from the snapshot

  • Encryption and decryption are transparent

  • Minimal impact on latency

  • EBS Encryption leverages keys from KMS (AES-256)

  • Copying an unencrypted snapshot allows encryption

  • Snapshots of encrypted volumes are encrypted

  • Encrypting an undecrypted EBS volume

    • Create an EBS snapshot of the volume

    • Encrypt the snapshot using copy

    • Create a new volume from the snapshot

    • Attach encrypted volume to original instance

EBS RAID

  • EBS is already redundant (replicated within an AZ)

  • But for increase of IOPS past max

  • Must do in OS not AWS

  • Or mirror EBS volumes

    • RAID 0 (Perf, get combined disk space, IO, throughput, not fault tolerant)

    • RAID 1 (mirror, send data to two* volumes at same time, 2x network traffic)

    • RAID 5, 6 (Not recommended for EBS)

EBS for SysOps

  • If you plan to use the root volume of an instance after it's terminated

    • Set the Delete on Termination flag to "no" (when creating the EC2 instance)

  • If you use EBS for high eprformance, use EBS-optimized instance types

  • If an EBS volume is unused you still pay for it

  • For cost savings over a longer period, snapshot volume and restore later needed (3x savings)

EBS Troubleshooting

  • **High wait time or slow response for SSD -> icnrease IOPS (or go with Provisioned IOPS on IO1)

  • EC2 won't start with EBS volume as root: make sure volume names are properly mapped (/dev/xvdb instead of /dev/xvda for example)

  • After increasing a volume size, you still need to repartiion to use the incremental storage (xfs_growfs for example)

CloudWatch and EBS

  • Important EBS Volume metrics

    • VolumeIdleTime: number of seconds when no read / write is submitted

    • VolumeQueueLength: number of operations waiting to be executred. High number means an IOPS or application issue

    • BurstBalance: if it becomes 0 we need a volume with more IOPS

  • GP2 volume types: 5 minute interval

  • IO1 volume types: 1 minute interval

  • EBS volumes havea status check:

    • Ok - volume is performing well

    • Warning - performance is below expected

    • Impaired - Stalled, performance severely degraded

    • Insufficient-data - metric data collection in progress

EFS

  • Managed NFS

  • EFS works with EC2 instances multi-AZ

  • Highly available, scalable, expensive (3xGP2), pay per use

  • For: content management, web serving, data sharing, WordPress

  • NFS v4.1

  • Use security groups to control access (on network drive)

  • Compatible with Linux based AMI (not windows)

  • Performance mode: General purpose (default), Max IO (used when 1000's of EC2 are using the EFS)

  • Has bursting or provisioned modes for IO

  • "EFS file sync" to sync from on-prem fs to EFS

  • Backup EFS-to-EFS (incremental, can choose frequency)

  • Encryption at rest using KMS

  • EFS now has lifecycle mgmt. to tier to EFS IA

  • Can use TLS for EFS

Instance store

  • Some instance do not come with root EBS

  • Ephemeral

  • Physically attached to your instance

  • Pros

    • Better I/O perf

    • Good for buffer / cache / scratch data / temporary content

    • Data survives reboot

  • Cons

    • On stop or termination instance store is lost

    • Can't resize the instance store

    • Backups must be operated by the user


S3

  • Bucket names must be globally unique

    • Global at top menu, (but regional service)

  • Minimum of 3 and maximum of 63 characters - no uppercase or underscores

  • Must start with a lowercase letter or number and can’t be formatted as an IP address (1.1.1.1)

  • Default of 100 buckets per account, and hard 1000 bucket limit via support request

  • Unlimited objects in buckets

  • Unlimited total capacity for a bucket

  • An object’s key is its name (FULL PATH including slashes and filename, but not bucket name)

  • An object’s value is its data (content)

  • An object’s size is from 0kb to 5TB (more than 5GB must use multi-part upload)

    • To upload a file larger than 160 GB, use the AWS CLI, AWS SDK, or Amazon S3 REST API.

  • Metadata (list of key/value pairs, system or user metadata)

  • Tags (Unicode key/value pair -max 10-), useful for security / lifecycle

  • Version ID (if versioning is enabled)

Versioning

  • Bucket level setting

  • If you overwrite a key/file you increment its version

  • Best practice to version your buckets

    • Protect against unintended deletes

    • Easy roll back to previous version

  • Any file that is not versioned prior to enabling versioning will have a version NULL

  • Deleting a file only adds a delete marker

S3 Websites

  • URL can be

    • .s3-website-.amazonaws.com

    • .s3-website..amazonaws.com

S3 CORS

  • If you request data from another S3 bucket you need to enable CORS

  • Cross Origin Resource Sharing allows you to limit the number of websites that can request files in your S3 (help limit costs)

  • Access-Control-Allow-Origin:

S3 Consistency Model

  • Amazon S3 provides read-after-write consistency for PUTS of new objects in your S3 bucket in all regions with one caveat. The caveat is that if you make a HEAD or GET request to the key name (to find if the object exists) before creating the object, Amazon S3 provides eventual consistency for read-after-write.

S3 Security

  • User based

    • IAM Policies - which API calls should be allowed for a specific user from IAM

  • Resource Based

    • Bucket Policies - bucket wide rules from the S3 console - allows cross account

    • Object ACLs - finer grain, not super popular

    • Bucket ACLs - less common

S3 Bucket Policies

  • Grant public access to the bucket

  • Force objects to be encrypted at upload

  • Grant access to another account (Cross account)

  • JSON based (4 components)

    • Resources: buckets and objects

    • Actions: Set of APIs to Allow or Deny

    • Effect: Allow or Deny

    • Principal: The account of user to apply the policy to

  • Networking: Supports VPC endpoints (for instances in VPC with no internet)

  • Logging and Auditing: S3 access logs can be stored in another bucket, API calls can be logged in CloudTrail

  • User Security: MFA can be required in versioned buckets to delete objects, Signed URLs = valid for a limited time (ex: premium video service for time)

S3 Encryption for Objects

Can also set default encryption for bucket

SSE-S3

  • Keys handled and managed by AWS S3

  • Object is encrypted server side, sent via HTTP/S

  • AES-256

  • Must set header: "x-amz-server-side-encryption":"AES256"

  • S3 Managed Data Key + Object > Encrypted

SSE-KMS

  • Keys handled and managed by KMS

  • Object is encrypted server side, sent via HTTP/S

  • KMS advantages: user control (rotation etc.) + audit trail

  • Must set header: "x-amz-server-side-encryption":"aws:kms"

  • KMS Customer Master Key (CMS) + Object > Encrypted

SSE-C

  • Server Side encryption using keys fully managed by customer outside AWS

  • S3 does not store the key

  • HTTPS must be used

  • Encryption key is provided (sent) in HTTP header, in every request

  • Client provided data key + Object > Encrypted, S3 throws away key

Client Side Encryption

  • Client library such as Amazon S3 Encryption Client

  • Clients must encrypt data themselves before sending to S3

  • Client must decrypt data themselves when retrieving from S3

  • Customer fully manages the keys and encryption cycle

Encryption in Transit

  • AWS S3 exposes both HTTP and HTTPS endpoints, HTTPS recommended

Default Encryption vs Bucket Policies

  • Old way was to use bucket policies to enable and to refuse any HTTP command without proper headers

  • New way is to click "default encryption" option in S3

  • Bucket Policies are evaluated before default encryption

  • Either SSE-S3 (AES-256) or SSE-KMS

S3 MFA Delete

  • To use MFA-Delete must enable Versioning on the S3 bucket

  • You need MFA to

    • permanently delete an object version (can do marker without)

    • suspend versioning on the bucket

  • You won't need it for

    • enabling versioning

    • listing deleted versions

  • Only bucket owner (root account) can enable/disable MFA-delete

  • Can only be enabled using the CLI

S3 Access Logs

  • Any request made to S3 from any account, authorized or denied, will be logged to another S3 bucket

  • Can analyze using data analysis tools (Hive, Athena, etc.)

  • Log format in docs

S3 Cross Region Replication

  • Must enable versioning (source and destination)

  • Must be in different regions (duh)

  • Can be different accounts

  • Copying is asynchronous

  • Must give proper IAM permissions to S3, needs Role

  • For:

    • Compliance, lower latency access, cross account replication

  • Can do based on whole bucket, prefix, tags

  • Can replicate encrypted if other account has access to KMS key

  • Can change storage class or ownership

S3 Pre-signed URLs

  • Can create a pre-signed URL via CLI or SDK

    • For downloads CLI

    • For uploads SDK

  • Valid by default for 3600 seconds, change with --expires-in [TIME_BY_SECONDS]

  • Users who receive pre-signed URL inherit permissions of the generator for GET/PUT

  • aws configure set default.s3.signature_version s3v4 (make URL KMS compatible)

  • aws s3 presign s3://bucketname/file.jpg --expires-in 300 --region ca-central-1 (S3 is global but bucket is regional)

  • Avoids direct access to the bucket from users

S3 Inventory

  • S3 Inventory helps manage your storage

  • Audit and report on the replication and encryption status of your objects

  • Use: Business, compliance, regualtory needs

  • Query with Athena, Redshift, Presto, Hive, etc

  • Can set up multiple inventories

  • Data goes from a source bucket to a target bucket (need to set up policy to place data, done automatically)

S3 Storage Tiers

  • S3 Standard - General Purpose

  • 99.999999999% Durability (10 mil objects 10k years, lose 1)

  • 99.99% availability

  • Can sustain 2 concurrent AZ loss

  • S3 Reduced Redundancy Storage (RRS)

    • Deprecated

    • 99.99% durability and availability

    • Can sustain loss of single AZ

    • Use for non-critical reproducible data

  • S3 Standard Infrequent Access (IA)

    • Suitable for data less frequently access but requires rapid retrieval

    • Retrieval fee

    • 99.999999999% Durability (10 mil objects 10k years, lose 1)

    • 99.99% availability

    • Can sustain 2 concurrent AZ loss

    • For backups, DR, etc.

  • S3 One Zone Infrequent Access

    • Same as IA, but data is stored in a single AZ

    • Retrieval fee

    • 99.999999999% Durability; data is lost when AZ is destroyed

    • 99.95% availability

    • Lower cost by 20% than IA

    • For secondary backup data, or recreatable

  • S3 Intelligent Tiering

    • Small monthly auto-tiering fee

    • Move between S3 and IA based on access patterns

    • 99.999999999% Durability, 99.9% availability

    • Can sustain single AZ loss

  • S3 Glacier

    • Alternative to Tape (10's of years)

    • 99.999999999% Durability

    • Cost per estorage per month ($0.004 / GB) + Retrieval fee (-10x than S3)

    • Each item is called an "Archive", up to 40TB size

    • ARchives are stored in "Vaults", similar to bucket

    • Retrieval options:

      • Expedited (1-5 mins) - $0.03 / GB and $0.01 per request (need to buy capacity units to use)

      • Standard (3-5 hours) - $0.01 per GB and 0.05 per 1000 requests

      • Bulk (5-12 hours) - $0.0025 per GB and $0.025 per 1000 requests

Glacier Operations

  • Upload - Single operation or by parts (MultiPart upload) for larger archives

  • Download - First initiate a retriveal job for the particular archive, Glacier the nprepares it for download. User then has a limited time to download the data from staging server

  • Delete - Use Glacier REST API or AWS SDK by specifying archive ID

  • Restore links have an expiry date

Glacier Vault Policies & Vault Lock

  • Vault is a collection of archives

  • Each Vault has:

    • ONE Vault access policy

    • One Vault lock policy

  • Vault Policies are written in JSON

  • Vault Access Policy is similar to bucket policy (restrict user / account permissions)

  • Vault Lock Policy is a policy you lock, for regulatory and compliance requirements

    • **The policy is immutable, it can never be changed

    • ex: forbid deleting an archive if less than 1 year old

    • ex: Implement WORM policy (write once, read many)

91c06d5ee5b59e78e539dce4704aacab.png

S3 Lifecycle Rules

  • Transition Actions: Defines when objects are transitioned to another storage class

  • Expiration Actions: Objects expire and are deleted

  • Can be used to delete incomplete multi-part uploads

  • Limit to prefix or tag

  • Can do current or previous versions

S3 Analytics - Storage Class Analysis (in s3 mgmt)

  • You can set up analytics to help determine when to transition objects from Standard to Standard_IA

  • Does not work for ONCEZONE_IA or GLACIER

  • Report is updated on daily basis

  • Takes about 24h to 48h to first start

  • Help you put together efficient Lifecycle Rules

Glacier

Snowball

  • Physically transport data in or out of AWS

  • How much usable

  • TB or PB

  • Alternative to network fees

  • Secure, tamper resistant, uses KMS 256

  • Tracking using SNS and text messages, E-Ink shipping label

  • For: large data migrations, DC decommission, disaster recovery

  • If it takes more than a week via network use Snowball instead

  • Has client for copying files

Snowball Edge

  • Adds computational capability

  • 100TB capacity, either:

    • Storage Optimized - 24 vCPU

    • Compute Optimized - 52 vCPU & optional GPU

    • Supports a custom EC2 AMI so you can process while transferring

    • Supports custom Lambda functions

AWS Snowmobile

  • Transfer exabytes (1EB = 1000PB = 1000000TB)

  • Each has 100PB of capacity, can use multiple in parallel

  • Use if transferring more than 10PB

Storage Gateway

  • Expose S3 on-premises

  • File Gateway

    • S3 buckets via NFS and SMB (all S3 modes)

    • Bucket access using IAM roles for each File Gateway

    • Recently used data is cached

    • Can be mounted on many servers

  • Volume Gateway

    • Block storage using iSCSI backed by S3

    • ^ Backed by EBS snapshots

    • Cached volumes: low latency access to most recent data

    • Stored volumes: entire dataset is on-premises, scheduled backups to S3

  • Tape Gateway

    • VTL Virtual Tape Library backed by S3 and Glacier

    • Back up data using existing tape based processes (and iSCSI interface)

    • Works with most backup softwares

S3 For SysOps

S3 Versioning

  • S3 Versioning creates a new version each time you change a file

  • That includes when you encrypt a file (good against crypto-ransom)

  • Deleting a file in the S3 bucket just adds a delete marker on the versioning (delete marker has 0 size)

  • To delte a bucket you need to remove all the file versions within it

CloudFront

CloudFormation

  • We didn’t specify a name in the json file for this bucket, so AWS names it with the [STACKNAME]-[LOGICAL_VOLUME_NAME]-[RANDOM_STRING] format.

  • What is logical volume name, based on resource in CFN?

  • Stacks have logical resources in them that create physical resources


CloudFront

  • Cached at edge locations

  • Popular with S3 but works with EC2 and LB as well

  • Helps with network attacks

  • Provides SSL (HTTPS) via ACM

  • Can use SSL (HTTPS) to talk internally to applications

  • Supports RTMP

  • Origin Access Identity

    • Limit S3 to be only accessed via this identity

CloudFront Access Logs

  • Logs every request made to CloudFront into a logging S3 Bucket

a559faab05e3590931a312c2e9af8ea4.png
  • Can generate reports on:

    • Cache Stats

    • Popular Objects

    • Top Referrers

    • Usage Reports

    • Viewers Reports

  • These reports are based on data from the Access Logs but you don't need to enable logs to get the reports

CloudFront Troubleshooting

  • CloudFront caches HTTP 4xx and 5xx status codes returned by our S3 (or the origin server)

  • 5xx error indicates Gateway issues

// May not be on exam

CloudFront Signed URL / Signed Cookies

  • To distrbute paid shared content which lives in S3

  • If S3 can only be accessed via CloudFront we can't use self-signed S3 URLs

  • Can attach a policy with:

    • URL expiration

    • IP ranges for access

    • Trusted signers (which AWS Account can create signed URLs)

  • CloudFront signed URLs can only be created using the AWS SDK

  • Validity length?

    • Share content, movies etc, short = few minutes

    • Private content (to user) longer = years

467b9f8275d13f52db7a380742e35b92.png

CloudFront vs S3 Cross Region Replication

  • CloudFront

  • Global Edge network

  • Files are cached for a TTL (maybe a day)

  • Great for static content that must be available everywhere

  • S3 Cross Region Replication

    • Must be set up for each region

    • Files are updated near real-time

    • Read only

    • Great for dynamic content that needs low-latency in a few regions

CloudFront Geo Restriction

  • Restrict who can access your distribution

    • Whitelist by country

    • Blacklist by country

  • Country is determined by usnig 3rd party Geo-IP database

  • Copywrite law etc.

done //

Athena

  • Serverless service to perform analytics directly against S3 files

  • Uses SQL to query

  • Has a JDBC / ODBC driver

  • Charged per query and amount of data scanned

  • Supports CSV, JSON, ORC, Avro, and Parquet

  • For: BI, analytics, reporting, analyze VPC vlow logs, ELB logs, CloudTrail trails, etc.

Databases

RDS

  • Postgres

  • Oracle

  • MySQL

  • MariaDB

  • MS SQL

  • Aurora (proprietary)

  • DB Identifier (name) must be unique across region

  • Your responsibility

    • Check IP / Port / SG inbound rules

    • In-database user creation and permissions

    • Creating database with or without public access

    • Ensure parameter groups or DB is configured to only allow SSL

  • AWS Responsibility

    • No SSH access

    • No manual DB patching

    • No Manual OS patching

    • No way to audit underlying instance

// Not on Exam

For SAs

  • Read replicas can only do SELECT

  • RDS supports Transparent Data Encryption for Oracle or SQL Server

    • Is on top of KMS, may affect performance

  • IAM Authentication vs un/pw for MySQL and PostgreSQL

    • Lifespan of an IAM authentication token is 15 mins (short-lived), better security

    • Tokens are generated by IAM credentials

    • SSL must be used (or connection refused)

    • Easy to use EC2 Instance Roles to connect to RDS DB (so don't need DB credentials in actual instance for non IAM)

Done //

  • Managed Service =

    • OS patching

    • Point in Time Restore backups

    • Monitoring dashboards

    • Read replicas for read perf

    • Multi AZ set for DR

    • Maintenance windows for upgrades

    • Scaling (vert and horiz)

    • BUT no SSH

    • No audit of underlying instance

RDS Read Replicas for scalability

  • Up to 5 Read Replicas

  • Within AZ, Cross AZ, or Cross Region

  • Replication is ASYNC (eventually consistent)

  • Replicas can be promoted to their own DB

  • Applications must updated connection string to leverage read replicas

    • One string for master, 1 for each replica

Can combo Read Replicas and DR Multi AZ

RDS Multi AZ (Disaster Recovery)

  • SYNC replication

  • One DNS name for auto failover to standby

  • Increases availability (duh)

  • For AZ loss (not cross region)

  • No manual intervention

  • Not for scaling

RDS Multi AZ vs Read Replicas

  • Multi AZ

    • Multi AZ is not used to support reads

    • The failover happens only in the following conditions

      • The primary DB instance fails

      • An AZ outage

      • The DB instance server type is changed

      • The OS of the DB instance is undergoing software patching

      • A manual failover of the DB instance was inititiated using Reboot with failover

    • No failover for DB operations: long-running queries, deadlocks, or DB corruption errors

    • Endpoint is the same after failover (no URL change in app)

    • Lower maintenance impact. AWS does maintenance on the standby, which is then promoted to Master

    • Backups are creeated from the standby (less impact, normally done on master)

    • Only within in a single region, region outage impacts availability

  • Read Replicas

    • Help scaling read traffic

    • A Read Replica can be promoted as a standalone database (manually)

    • Read Replicas can be within AZ, Cross AZ, or Cross Region

    • Each Read Replica has its own DNS endpoint

    • You can have Read Replicas of Read Replicas

    • Read Replicas can be Multi-AZ

    • Read replicas help with DR by using Cross Region RR

    • Read Replicas are not supported for Oracle

    • Read Replicas can be used to run BI/Analytics reports etc

DB Parameter Groups

  • You can configure the DB engine using Parameter Groups

  • Dynamic Parameters are applied immediately

  • Static parameters are applied after instance reboot

  • You can modify the parameter group associated with a DB (replace with your own custom) (must reboot)

  • Must know

    • PostgreSQL / SQL Server: **red.force_ssl=1 -> force SSL connections

    • MySQL / MariaDB: GRANT SELECT ON mydatabase.* TO 'myuser'@'%' IDENTIFIED BY '...'' REQUIRE SSL;

RDS Backups

  • Automatically enabled

  • Automated Backups

    • Daily full snapshot of DB

    • Captures transaction logs in real

      • Ability to restore to any point in time

    • 7 days retention (can increase to 35) (can lower as well)

  • DB Snapshots (can be manually triggered)

    • Retention for as long as you want (keep specific state, or long term)

Backup vs Snapshots

Backups

  • Backups are "continuous" and allow point in time recovery

  • backups happen during maintenance windows

  • When you delete a DB instance, you can retain automated backups

  • Backups have a retention period you set between 0 and 35 days (so they're all time limited)

Snapshots

  • Snapshots use IO operations and stop the DB from seconds to minutes

  • Snapshots taken on a Multi AZ DB don't imact master, just the standby

  • Snapshots are incremental after the first snapshot (which is full)

  • You can copy & share snapshots

  • Manual snapshots don't expire

  • You can take a "final snapshot" when you delete your DB

RDS Encryption

  • Encryption at rest with AES KMS - AES256 encryption

    • Only at creation

    • or: snapshot, copy as encrypted, create DB from snapshot (same as EBS)

  • SSL certificates to encrypt data in flight

  • To enforce SSL:

    • PostgreSQL: rds.force_ssl=1 in the AWS RDS console (parameter groups)

    • MySQL: Within the DB: GRANT USAGE ON . TO 'mysqluser'@'%' REQUIRE SSL;

  • To connect using SSL:

    • Provide SSL Trust certificate (can be downloaded from AWS)

    • Provide SSL options when connecting to DB

RDS Security

  • Encryption is on done on DB creation or do snap copy encrypt create DB

  • RDS DB are usually deployed in private subnet

  • Security works by leveraging security groups for who can communicate with it

  • IAM policies help control who can manage RDS

  • Traditional username and password to log into DB itself

  • IAM users now works with Aurora/MySQL

RDS API for SysOps

  • DescribeDBInstances API

    • Helps to get a list of all DB instances, including Read Replicas

    • Helps to get DB version

  • CreateDBSnapshot API - Make a snapshot

  • DescribeEvents API - Helps to return information about events related to your DB instance

  • RebootDBInstance API - Helps to initiate a "forced" failover by rebooting DB instance

RDS with CloudWatch

  • Cloudwatch Metrics associated with RDS (gathered from hypervisor)

    • DatabaseConnections

    • SwapUsage

    • RadIOPS/WriteIOPS

    • ReadLatency / WriteLatency

    • ReadTrhoughPut / WriteThroughPut

    • DiskQueueDepth

    • FreeStorageSpace

  • Enhanced Monitoring (gathered from agent on DB instance)

    • Useful when you need to see how many different processes or threads use the CPU

    • Access to over 50 new CPU, memory, file system, and disk I/O metrics

    • 1-60 secs granularity

RDS Performance Inisghts

  • Visualize your DB performance and analyze any issues that affect it

  • With Perf Insights dashboard you can visualize the DB load and filter load by:

    • By Waits -> find the resource that is the bottleneck (CPU, IO, lock, etc)

    • By SQL statements -> find the SQL statement that is the problem

    • By Hosts -> find the server that is using the DB the most

    • By Users -> find the user that is using the DB the most

  • DBLoad - the number of active sessions for the DB engine

  • SSQL queries that are putting load on your DB (it's own category in dashboard)

  • Not supported on T2 instances

RDS vs. Aurora

  • Proprietary

  • Postgres and MySQL drivers supported

  • Cloud optimized - 5x perf for MySQL, 3x perf for Postgres

  • Automatically grows in increments of 10GB up to 64TB

  • Aurora can have 15 replicas, MySQL only 5, and replication is faster (sub 10ms lag)

  • Failover in Aurora is instantaneous, HA native.

  • Aurora costs 20% more than RDS, but is more efficient.

Aurora

  • Automatic failover

  • Backup and recovery

  • Isolation and security

  • Industry compliance

  • Push-button scaling

  • Automated patching with zero downtime

  • Advanced monitoring

  • Routine maintenance

  • Backtrack: restore data at any point in time without backups

  • HA and Read Scaling

    • 6 Copies of data across 3 AZ

      • 4 copies out of 6 needed for writes

      • 3 copies out of 6 needed for reads

      • Self healing with peer-to-peer replication (for corrupted data)

      • Storage is striped across 100's of volumes

    • One Aurora instance takes writes, Master

    • Automated failover for master in less than 30 secs

    • Master + up to 15 Read Replicas serve reads (any replica can become master)

    • Support for Cross Region Replication

  • Shared logical storage volume across AZs for Replication + Self-Healing + Auto Expanding

  • Master is only writer

    • Writer Endpoint (DNS name) always points to current master, for failover

    • Read Replicas can do auto-scaling

      • Reader Endpoint Connection load balancing for reads, across all scaled instances. Happens at connection level not statement level.

      • ![Screen Shot 2019-11-18 at 14.10.27.png](../../../../_resources/Screen Shot 2019-11-18 at 14.10.27.png)

Aurora Security

  • Encryption at rest using KMS

  • Automated backups, snapshots and replicas are also encrypted

  • Encryption in flight using SSL (same process as MySQL or Postgres)

  • Authentication using IAM

  • You are responsible for protecting via SG

  • No SSH

Aurora Serverless

  • No need to choose an instance size

  • Only supports MySQL 5.6 & Postgres in beta

  • Helpful when you can't predict workload

  • DB cluster starts, shuts down, and scales automatically based on CPU / connections

  • Can migrate from Aurora Cluster to Serverless and vice versa

  • Serverless usage is measured in ACU (Aurora Capacity Units)

  • Billed in 5 minute increments of ACU

  • Some features aren't supported in serverless, so check docs

Aurora for SAs

  • Can use IAM for Aurora

  • Aurora Global Databases span multiple regions and enable DR

    • One primary region

    • One DR Region

    • The DR region can be used for lower latency reads

    • < 1 sec replication lag on average

  • If not using Global Databases you can create cross region Read Replicas

    • FAQ recommends Global Databases instead

Elasticache

  • Managed in-memory DB, high perf, low latency.

  • Redis or Memcached

  • Reduce load on DB

  • Make app stateless (keep state in cache)

  • Write scaling using Sharding

  • Read scaling using Read Replicas

  • Multi AZ with Failover

  • AWS takes care of all normal stuff

  • App queries ElastiCache, either gets cache hit or cache miss, in case of miss it gets cached for hit next time (by application)

  • Cache must come with invalidation strategy for only most current data (app based)

  • User session store (keep it stateless)

    • Application writes session data into ElastiCache

    • User hits a different application instance

    • Instance retrieves the data from cache to keep session going

  • Redis

    • In-memory key-value store

    • Super low latency (sub ms)

    • Cache survives reboot by default (persistence)

    • Multi AZ with automatic failover for DR (if you want to keep cache data)

    • Support for Read Replicas and Cluster

    • Good for: User sessions, Leaderboard (has a sort), Distributed states, Relive pressure on DB, Pub / Sub capability for messaging

  • Memcached

    • In-memory object store

    • Cache does not survive reboots

    • Good for: Quick object retrieval, cache often accessed objects

ElastiCache for SAs

  • Security

    • Redis supports RedisAUTH (un/pw)

    • SSL in-flight must be enabled and used

    • Memcached supports SASL

    • None support IAM

    • IAM policies are used only for AWS API level security

  • Patterns for ElastiCache

    • Lazy Loading: all read data is cached, can become stale

    • Write Through: Adds or updates data in the cache when written to DB (no stale data)

    • Session Store: stores temp session data (using TTL features maybe)

Monitoring, Audit, and Performance

CloudWatch

  • CloudWatch provides metrics for every service in AWS

  • Metric is a variable to monitor (CPUUtilization, NetworkIn, etc)

  • Metrics belong to namespaces

  • Dimension is an attribute of a metric (instance id, environment, etc)

  • Up to 10 dimensions per metric

  • Metrics have timestamps

  • Can create a CloudWatch dashboard of metrics

Detailed Monitoring

  • EC2 instance metrics have metrics every 5 minutes

  • With detailed monitoring (for a cost) you get data every 1 minute

  • Use detailed monitoring for more effective ASG scaling

  • Free Tier allows up to 10 detailed monitoring metrics

  • EC2 Memory usage is not pushed by default, msut be pushed from inside the instance

CloudWatch Custom Metrics

  • Possibility to define and send your own custom metrics to CloudWatch

  • Ability to use dimensions (attributes) to segment metrics

    • Instance.id

    • Environment.name

  • Metric resolution:

    • Standard: 1 minute

    • High resolution: Down to 1 second (StorageResolution API parameter) - Higher Cost

    • Use API call PutMetricData

    • Use exponential back off in case of throttle errors

  • Available metrics

    • ASGAverageCPUUtilization—Average CPU utilization of the Auto Scaling group.

    • ASGAverageNetworkIn—Average number of bytes received on all network interfaces by the Auto Scaling group.

    • ASGAverageNetworkOut—Average number of bytes sent out on all network interfaces by the Auto Scaling group.

    • ALBRequestCountPerTarget—Number of requests completed per target in an Application Load Balancer target group.

CloudWatch DashBoards (exam)

  • Great way to set up dashboards for quick access to key metrics

  • Dashboards are global, go to each region to set up, but see anywhere

  • Dashboards can include graphs from different regions

  • You can change the time zone & time range of the dashboards

  • You can set up automatic refresh (10s, 1m, 2m, 5m, 15m)

  • Pricing:

    • 3 Dashboards (up to 50 metrics) for free

    • $3/dashbaord/month afterwards

CloudWatch Logs

  • Applications can send logs to CloudWatch via the SDK

  • CloudWatch can collect logs from:

    • Elastic Beanstalk: Collects from application

    • ECS: Colelcts from containers

    • Lambda: Collects from functions

    • VPC Flow Logs

    • API Gateway

    • CloudTrail based on filter

    • CloudWatch Logs Agents: For example on EC2 machines

    • Route53: Logs DNS queries

  • CloudWatch logs can go to:

    • Batch exporter to S3 for archival

    • Stream to ElasticSearch cluster for further analytics

Log storage architecture:

  • Log Groups: Arbitary name, usually representing an application

  • Log Stream: instances within application / log files / containers (A log stream is a sequence of log events that share the same source)

  • Can define log expiration policies (never expire, 30 days, etc)

  • Using the CLI we can tail CloudWatch logs

  • To send logs to CloudWatch, make sure IAM permissions are correct!

  • Security: Encryption of logs using KMS at the Group level

CloudWatch Logs Metric Filter & Insights

  • CloudWatch Logs can use filter expressions

    • For example, find a specific IP inside a log

    • Metric filters can be used to trigger alarms (found specific IP, then alarm)

      • Create your own metrics based on these filters, and then alarms

  • CloudWatch Logs Insights can be used to query logs, and add queries to CloudWatch Dashboards (comes with some default)

CloudWatch Alarms

  • Alarms are used to trigger notifications for any metric

  • Alarms can go to Auto Scaling, EC2 Actions, SNS Notifications

  • Various options (sampling, %, max, min, etc)

  • Alarm States:

    • OK

    • INSUFFICIENT_DATA

    • ALARM

  • Period:

    • Length of time in seconds to evalute the metric

    • High resolution custom metrics: can only choose 10 sec or 30 sec

  • Alarm Targets (exam)

    • Stop, Terminate, Reboot, or Recover an EC2 instance

    • Trigger autoscaling action

    • Send notificatin to SNS (from which you can do almost anything)

  • Good to know

    • Alarms can be created based on CloudWatch Logs Metrics Filters

    • CloudWatch doesn't test or validate the actions that are assigned

    • To test alarms and notifications, set the alarm state to Alarm using CLI

      • aws cloudwatch set-alarm-state --alarm-name "myalarm" --state-value ALARM --state-reason "testing purposes"

CloudWatch Events

  • Source + Rule -> Target

  • Schedule: Like a cron job (same format)

  • Event Pattern: Event rules to react to a service doing something (Ex: CodePipeline state changes)

  • Triggers to Lambda functions, SQS/SNS/Kinesis Messages

  • CloudWatch Event creates a small JSON document to give info on the change

CloudTrail

  • Provides governance, compliance, and audit for your account

  • Enabled by default

  • Get a history of events / API calls made within your account by:

    • Console

    • SDK

    • CLI

    • AWS Services

  • Can put logs from CloudTrail into CloudWatch logs

  • If a resource is deleted, look into CloudTrail first

  • Shows past 90 days of activity (store elsewhere after, CloudWatch in Trail config etc)

  • The default UI only shows Create, Modify, or Delete events

  • CloudTrail Trail

    • Get a detailed list of all the events you choose

    • Ability to store these events in S3 for further analysis

    • Can be region specfic or global

  • CloudTrail Logs have SSE-S3 encryption by default when placed in S3

  • Control access to S3 using IAM, Bucket Policy, etc

AWS Config

  • Helps with auditing and compliance of your AWS resources

  • Helps record configurations and changes over time

  • Helps record compliance over time

  • Possibility of storing AWS Config data into S3 (to be queried by Athena)

  • Questions that can be solved by AWS Config

    • Is there unrestricted SSH access to my security groups

    • Do my buckets have any public access

    • How has my ALB confgiuration changed over time

  • You can receive alerts (SNS notifications) for any changes

  • AWS Config is a per-regios service

  • Can be aggregated across regions and accounts

Config Rules

  • Can use AWS managed config rules (over 75)

  • Can make custom config rules (must be defined in AWS Lambda)

    • Evaluate if each EBS disk is of type gp2

    • Evaluate if each EC2 instance is t2.micro

  • Rules can be evaluated by triggers

    • For each config change

    • And / or at regular intervals

  • Pricing - No Free Tier, $2USD per active rule per month (decreases after 10 rules)

  • AWS Config Resource

    • View compliance of a resource over time

    • View configuration of a resource over time

    • View CloudTrail API calls if enabled

CloudWatch vs CloudTrail vs Config

  • CloudWatch

    • Performance Monitoring (metrics, CPU, network, etc) & dashboards

    • Events & Alerting

    • Log Aggregation & Analysis

  • CloudTrail

    • Record API calls madewithin your Account by everyone

    • Can define trails for specific resources

    • Global Service

  • Config

    • Record confgiruation changes

    • Evaluate against compliance rules

    • Get timeline of changes and compliance


AWS Account Management

AWS Status - Service Health Dashboard

  • Shows all regions, all services health

  • Shows historical information for each day

  • Has an RSS feed you can subscribe to

  • https://status.aws.amazon.com

AWS Personal Health Dashboard

  • Global Service

  • Show how AWS outages directly impact you

  • Shows impact on your resources

  • List issues and actiosn you can do to remediate them

  • https://phd.aws.amazon.com

AWS Organizations

  • Global Service

  • Allows to manage multiple AWS accounts

  • The main account is the master account - can't change it

  • Other accounts are member accounts

  • Member accounts can only be part of one organization

  • Consolidated Billing across all accounts - single payment method

  • Pricing benefits from aggregated usage (volume discount)

  • API is available to automate AWS account creation

OU & Service Control Policies (SCPs)

  • Organize accounts in Organizational Units (OU)

    • Can be anything: dev/test/prod or Finance/HR/IT

    • Can nest OU within OU

  • Apply SCP to OU

    • Permit / Deny access to AWS services

    • SCP has a similar syntax to IAM

    • It's a filter to IAM

    • Policies seem to be inherited

  • Helpful to create sandbox accounts

  • Helpful to separate dev and prod resources

  • Helpful to only allow approved services

AWS Service Catalog

  • For users that are new to AWS and have too many options, may create stacks that are not compliant / in line with the rest of the organization

  • Some users just want a quick self-service portal to launch a set of authorized products pre-defined by admins

  • Such as: virtual machines, databases, storage options, etc...

  • Admins create CloudFormation templates -> products, collection of Products is a Portfolio, user gets product list

AWS Cost Explorer

  • A graphical tool to view and analyze your costs and usage, trends

  • Review charges and cost associated with your AWS account or org

  • Forecast spending for next 3 months

  • Get recommendations/insight for which EC2 Reserved Instances to purchase

    • View Reservation Summary, and net savings from them (EC2, RDS, etc)

  • Access to default reports

  • API to build custom cost management applications

AWS Budgets

  • Create Budget and send alarms when costs exceeds the budget

  • 3 types of budgets: Usage, Cost, Reservation

  • For Reserved Instances (RI)

    • Track utilization

    • Supports EC2, ElastiCache, RDS, Redshift

  • Up to 5 SNS notifications per budget

  • Can filter by: Service, Linked Account, Tag, Purchase Option, Instance Type, Region, AZ, API Operation, etc

  • Same options aas AWS Cost Explorer

  • 2 Budgets are free, then $0.02/day per

AWS Billing Alarms

  • Different than Budget Alerts, almost same as Cost budget

  • Billing data metrics are stored in CloudWatch us-east-1

  • Billing data are for overall worldwide AWS costs

  • It's for actual costs, not project costs

AWS Cost Allocation Tags

  • With Tags we can track resources that relate to each other

  • With Cost Allocation Tags we can enable detailed costing reports

  • Just like Tags, but they show up as columns in reports

  • AWS Generated Cost Allocation Tags

    • Automatically applied to the resource you create

    • Starts with Prefix **aws: (eg aws:createdBy)

    • They're not applied to resources created before the activation

  • User tags

    • Defined by the user

    • Starts with Prefix user:

  • Cost Allocation Tags automatically appear in the Billing Console

  • Takes up to 24h for the tags to show up in report


Security and Compliance

DDoS Protection on AWS

  • AWS Shield Standard: protects against DDoS attacks for your website and applciations, no additional cost

  • AWS Shield Advanced: 24/7 premium DDoS protection

  • AWS WAF: Filter specific requests based on rules

  • Cloudfront and Route 53

    • Availability protection using global edge network

    • Combined with AWS Shield, provide attack mitigation at edge

  • Be ready to Scale - leverage AutoScaling

  • Separate static resources (S3 / CloudFront) from dynamic ones (EC2/ALB)

fdd095594f0010fa29f60d1bf74c5e17.png

AWS Shield

  • AWS Shield Standard

    • Free Service protects against attacks such as SYN/UDP floods, Reflection attacks, and other layer 3/4

  • AWS Shield Advanced

    • Optional DDoS mitigation service ($3000 per month)

    • Protects against more sophisticated attacks on CloudFront, Route 53, Classic, Application & Network Load Balancers, EIP, EC2

    • 24/7 access to AWS DDoS response team (DRP)

    • If you do get higher fees due to scaling, fees are covered

WAF

  • Protects application from common web exploits

  • Define customizable web security rules:

    • Control which traffic to allow or block to your web applications

    • Rules can include: IP addresses, HTTP headers, HTTP body, or URI strings

    • PRotects against common attacks - SQL injection, Cross site scripting

    • Protects against bots, bad user agents, etc

    • Size constraints

    • Geo match

  • Deploy on CloudFront, Application Load Balancer, or API GW

  • Leverage existing marketplace of rules

Penetration Testing on AWS

  • Permission is required (not any more though)

  • Request permissions with AWS root credentials

  • No 3rd party testing

  • For EC2, ELB, RDS, Aurora, CloudFront, API GW, Lambda, Lightsail

  • Cannot test against nano / micro / small instances

  • Takes 2 business days to be approved

AWS Inspector

  • Only for EC2 instances

  • Analyze against known vulnerabilities

    • Common Vulnerabilities and Exposures (CVE)

    • Center for Internet Security (CIS) Benchmarks

    • Security Best Practices

    • Runtime behaviour Analysis

  • Analyze against unintended network accessibility

    • Network reachability

  • AWS Inspector Agent must be isntalled on OS in EC2 instances

  • Define template (rules package, duration, attributes, SNS topics)

  • No custom rules possible, only AWS managed

  • Afterwards you get a report with a list of vulnerabilities

  • Use SSM instead of manual install

  • Does has CPU impact

Logging in AWS

  • CLoudTrail Trails - Trace all API calls

  • Config Rules - For config & compliance over time

  • CloudWatch Logs - For full data retention

  • VPC Flow Logs - IP traffic within your VPC

  • ELB Access Logs - Metadata of requests made to your load balancers

  • CloudFront Acess Logs - Web Distribution access logs

  • WAF Logs - Full logging of all requests analyzed by WAF

  • Logs can be analyzed using Athena if they're stored in S3

  • Should encrypt logs in S3, controll access using IAM & Bucket Policies, MFA

GuardDuty

  • Intelligent threat discovery to protect AWS Account

  • Uses Machine Learning algorithms, anomaly detection, 3rd party data

  • One click to enable (30 day trial), no need to install software

  • Input data includes

    • CloudTrail Logs: Unusual API calls, unauthorized deployments

    • VPC Flow Logs: Unusual internal traffic, unusual IP addresses

    • DNS Logs: Compromised EC2 instances sending encoded data within DNS queries

  • Notifies you in case of findings

  • Integration with AWS Lambda

Trusted Advisor

  • No need to install anything - high level AWS Account assessment

  • Analyzes your AWS accounts and provides recommendations:

    • Cost optimization

    • Performance

    • Security

    • Fault Tolerance

    • Service Limits (ie getting close to etc)

  • Core Checks and Recommendations - all customers

  • Can enable weekly email notifications from the console

  • Full Trusted Advisor - Available for Business & Enterprise support plans

    • Ability to set CloudWatch alarms when reaching limits

KMS Overview + Encryption In Place

  • Any time you need to share sensitive information use KMS

    • DB passwords

    • Credentials to external service

    • Private Key of SSL certificates

  • The value in KMS is that the CMK used to encrypt data can never be retireved by the user, and the CMK can be rotated for extra security

  • **Never store secrets in plaintext, especially in code!

  • Encrypted secrets can be stored in the code / envronment variables

  • KMS can only help in encrypting up to 4KB of data per call

  • If data > 4KB, use envelope encryption

  • To give access to KMS to someone:

    • Make sure the Key Policy allows the user

    • Make sure the IAM Policy allows the API call

  • Able to fully manage the keys and policies:

    • Create, Diable, Enable, Rotation policies

  • Able to audit eky usage (using CloudTrail)

  • Three types of Customer Master Keys:

    • AWS Managed Service Default CMK: free

    • User Keys created in KMS: $1 / Month

    • User Keys imported (must be symmetric 256-but key): $1 / Month

    • pay for API calls to KMS ($0.03 / 10000 calls)

9521ff536efccf09b1c7eb5040a68b3f.png

Encryption in AWS Services

  • Requires migration (through Snapshot/backup)

    • EBS Volumes

    • RDS databases

    • ElastiCache

    • EFS network file system

  • In-place encryption

    • S3

Cloud HSM Overview

  • KMS -> AWS manages the software for encryption

  • CloudHSM -> AWS provisions encryption hardware

  • Dedicated Hardware (HSM = Hardware Security Module)

  • You entirely manage your own encryption keys (not AWS)

  • The CloudHSM hardware device is tamper resistant

  • FIPS 140-2 Level 3 Compliance

  • CloudHSM clusters are spread across multi AZ

  • Supports both symmetric and asymmetric encryption (ie SSL/TLS keys), KMS does only symmetric

  • No free tier

  • Has Cryptographic Acceleration (SSL/TLS, Oracle TDE)

  • Must use the CloudHSM Client Software, no API

5cec42bdd9fb07f0c29d431c32e0bce2.png

MFA + IAM Credentials Report

  • AWS MFA accepts both virtual and hardware MFA devices

  • MFA for root user can be configured from the IAM dashboard

  • MFA can also be configured from the CLI

  • Can set up MFA for individual users

  • Credentials Report

    • A CSV report file on all the IAM users and credentials

    • This shows who all have enabled MFA

IAM PassRole Action (exam)

  • In order to assign a role to an EC2 instance you need IAM:PassRole

  • Can be used for any service where we assign roles, not just EC2

Security Token Service (STS) & Cross Account Access

  • Allows to grant limited and temproary access to AWS resources

  • Token is valid for up to one hour (must be refreshed)

  • Cross Account Access

    • Allows users from one AWS account to access resources in another

      • Define an IAM Role for another account to access

      • Define which accounts can access this IAM Role

      • Use AWS STS to retrieve credentials and impersonate the IAM Role you have access to (AssumeRole API)

      • Temporary credentials can be valid between 15 minutes to 1 hour

9866aef9d6fd8bcda49b53aee9d7b8fb.png
  • Federation (AD)

    • Provides a non-AWS user with temporary AWS access by linking users Active Directory credentials

    • Uses SAML

    • Allows Single Sign On (SSO) which enables users to log in to AWS console without assigning IAM credentials

  • Federation with third party providers / Cognito

    • Used mainly in web and mobile applications

    • Makes use of FB/G/Amazon etc to federate them

Identity Federation with SAML & Cognito

  • Federation lets users outside of AWS to assume a temporary role for accessing AWS resources

  • These users assume an identity provided access role

  • Federation assumes a form of 3rd party authentication

    • LDAP, MS AD (~SAML), SSO, OpenID, Cognito

  • Using federation you don't need to create IAM users, user mgmt is outside AWS

ee6109774ab50a5e04d4c46cf14deede.png

SAML Federation for Enterprises

  • To integrate AD / ADFS with AWS (or any SAML 2.0)

  • Provides access to AWS Console or CLI (through temp creds)

  • No need to create an IAM user for each of your employees

d0214bb10d337b1f8261d4a3b988861c.png

Custom Identity Broker Application for Enterprises

  • Use only if identity provider is not compatible with SAML 2.0

  • Must write own broker

  • The identity broker must determine the appropriate IAM Policy

AWS Cognito - Federated Identity Pools For Public Applications

  • Goal: Provide direct access to AWS resources from the client side

  • How:

    • Log in to dereated identity provider - or remain anonymous

    • Get temporary AWS credentials back from the Federated Identity Pool

    • The credentials come with a pre-defined IAM policy stating their permissions

AWS Artifact

  • Portal the privdes customer with on-demand access to AWS compliance documentation and AWS agreements

  • Can be used to support internal audit or compliance

Security and Compliance Section Summary

  • AWS Shield: Automatic DDoS Protection + 24/7 support for advanced

  • AWS WAF: Firewall to filter incoming requests based on rules

  • AWS Insepctor: For EC2 only, install agents and find vulernabilities

  • AWS GuardDuty: Find malicious behaviour with VPC, DNS, and CloudTrail Logs

  • AWS Trusted Advisor: Analyze AWS account and get recommendations

  • AWS KMS: Encryption keys managed by AWS

  • AWS CloudHSM: Hardware encryption, we manage keys, supports asymmetrical

  • AWS STS: Generate security token

  • Identity Federation: SAML 2.0 or Custom for Enterprise, Cognito for Apps

  • AWS Artifact: Get access tocompliance reports such as PCI, ISO, etc

  • AWS Config: Track config changes and compliance against rules (over time)

  • AWS CloudTrail: Track API calls made by users within an account

Route 53

  • Most common records

    • A: URL to IPv4

    • AAAA: URL to IPv6

    • CNAME: URL to URL (non root domain)

    • Alias: URL to AWS resource (root and non-root), free of charge, supports native health checks

  • Can use

    • Public domain names

    • Private domain names that can only be resolved by your VPC instances

  • $0.50 per hosted zone

  • Has

    • Load Balancing (through DNS, client LB)

    • Health checks (limited)

    • Routing policy: simple, failover, geolocation, latency, weighted, multi value

  • Simple Routing Policy

    • Maps a domain to one URL

    • Use when directing to a single resource

    • Cannot attach health checks

    • If multiple values are returned, a random one is chosen by client

  • Weighted Routing Policy

    • Control % of requests that go to specific endpoint (ex: 70, 20, 10. Sum does not have to be 100)

    • Create multiple record sets with weighted option

    • Helpful to test 1% of traffic on new app

    • Split traffic between regions

    • Can be associated with health checks

  • Latency Routing Policy

    • Redirect to server that has the least latency, close to request

    • Evaluated in terms of user to designated AWS region

    • Must specify region in latency record

    • Germany could be directed to US if lower latency

Route 53 Geolocation Policy

  • Different from latency based

  • Based on user location

  • Traffic from England should go to X

  • Must have a default policy if no other match exists

Multi Value Routing Policy

  • Use when routing traffic to multiple instances

  • When want to associate a Route 53 health check with records, removes unhealthy from returned values

  • Up to 8 healthy records are returned for each MultiValue query (even if you have 50)

  • MultiValue is not a substitute for using ELB

Route 53 Health Checks

  • Will not send traffic to if failed

  • Deemed unhealthy if checks fail 3 times

  • Deemed healthy if checks pass 3 times

  • Default interval 30 secs (can set fast health check at 10s, higher cost)

  • About 15 health checkers will launch to check endpoint health

    • one request every 2 secs on average

  • Can have HTTP, TCP, and HTTPS check (no SSL certificate verification)

  • Possibility of integrating health checks with CloudWatch

  • Health checks can be linked to Route 53 DNS record set

Route 53 as a Registrar

  • Offer both Registrar and DNS service

VPC

CIDR

  • Two components

    • Base IP (xx.xx.xx.xx)

    • Subnet mask (/32) (defines how many bits can change in an IP)

      • Can take two forms

        • /24

        • 255.255.255.0 (less common)

      • /32 = 1 IP = 2^0

      • /31 = 2 IP = 2^1

      • /30 = 4 IP = 2^2

      • /29 = 8 IP = 2^3

      • /24 = 256 IP = 2^8

      • etc

      • /16 = 65536 = 2^16

      • /0 = all = 2^32

      • /32 - No IP number can change

      • /24 - Last .xIP number can change

      • /16 - Last x.xIP number can change

      • /8 - Last x.x.xIP number can change

      • /0 - All x.x.x.xIP numbers can change

Public vs Private

  • IANA via RFC 1918

  • Private IP can have the following values

    • 10.0.0.0 - 10.255.255.255 (10.0.0.0/8)

    • 172.16.0.0 - 172.31.255.255 (172.16.0.0/12) AWS default

    • 192.168.0.0 - 192.168.255.255 (192.168.0.0/16)

VPC in AWS - IPv4

  • Can have multiple VPCs per region (5 soft limit)

  • Max CIDR per VPC is, following:

    • Min size /28 = 16 IP

    • Max size /16 = 65535 IP

  • Because VPC is private, only RFC1918 addresses

  • VPC CIDR should not overlap with your other networks

Subnets

  • AWS reserves 5 IPs (first 4 and last 1 of range) in each Subnet

  • They are not available for use

  • For CIDR 10.0.0.0/24:

    • 10.0.0.0: Network address

    • 10.0.0.1: Reserved by AWS for the VPC router

    • 10.0.0.2: Reserved by AWS for mapping to Amazon provided DNS

    • 10.0.0.3: Reserved for future use

    • 10.0.0.255: Network broadcast (assume not available for exam)

  • Exam Tip: If you need 29 IP addresses for EC2 you can't choose a /27 because it's only 32 IPs, need a /26 (64IP)

Internet Gateway

  • Helps VPC internet connection

  • Scales horizontally, HA, and redundant

  • Must be created separately from VPC

  • One VPC per IGW, one IGW per VPC

  • IGW is also a NAT for the instances that have a public IPv4

  • Will not have internet access without Route Tables

NAT Instances (outdated)

  • Allow instances in the private subnet to connect to the internet

  • Must be launched in a public subnet

  • Must disable EC2: Source / Destination Check

  • Must have an Elastic IP (because route tables require fixed)

  • Route table must be configured to route trafcic from private subnets to NAT instance

  • Pre-configured Amazon Linux AMI are available

  • Not highly available or resilient setup by default

  • Would need to create an ASG in Multi AZ + resilient user-data script

  • Internet traffic bandwidth depends on EC2 instance performance

  • Must manage security ggroups & rules

    • Inbound

      • Allow HTTP/S from private subnets

      • Allow SSH from hom network (through IGW)

    • Outbound

      • Allow HTTP/S traffic to internet

      • Allow ICMP traffic to internet

NAT Gateway (new)

  • Only IPv4

  • AWS managed NAT, higher bandwidth, better availability, no admin

  • Pay by the hour for usage and bandwidth

  • NAT is created in a specfic AZ, uses EIP (can be in used Public Subnet)

  • Cannot be used by an instance in that subnet (only from other subnets)

  • Requires and IGW (Private subnet -> NAT -> IGW)

  • 5 Gbps of bandwidth with auro-scaling up to 45Gbps

  • No security groups required

* Differences between the two

DNS Resolution in VPC

  • enableDnsSupport: (=Edit DNS Resolution Setting)

    • Default True

    • Decides if DNS resolution is supported for the VPC

    • IfTrue, queries the AWS DNS server at 169.254.169.253

  • enableDnsHostname: (=Edit DNS Hostname setting)

    • False by default for newly created VPC, True by default for Default VPC

    • Won't do anything unless enableDnsSupport=True

    • IfTrue, assign a public hostname to EC2 instances if it has a public IP

  • If you must use custom DNS domain names in a private zone in Route 53, you must have both as TRUE

  • NACL are like a firewall controlling traffic to and from subnet

  • Default NACL allows everything inbound and outbound

  • One NACL per Subnet, new Subnets are assigned the Default NACL

  • Define NACL rules:

    • Rules have a number (1 - 32776) and LOWER number have precedence (once a number is matched it wins and ignores after)

    • Last rule is an asterisk (*), and denies all in case of no match

    • AWS recommends adding rules by increment of 100

  • Newly created NACL will deny everything

  • NACL are great way of blocking a specfic IP at the subnet level

  • Can be associated to multiple subnets

  • Rmemeber ephemeral ports

Inbound

c9d8ee87beb33b3f0ced90abd3b7511a.png
  • SG is Stateful on outbound, will allow out an incoming request return even if outbound rules say not to (SG evaluates all rules before deciding)

  • NACL is Stateless on outbound, all rules are evaluated

Outbound

a84ab894d813b4282b36bb3a2e0ad1a1.png
  • SG is Stateful on inbound, will allow in a returning request even if inbound rules say not to

  • NACL is Stateless on inbound, all rules are evaluated

6edae8ef528ffd134500da3ef66471aa.png

VPC Endpoints

  • Endpoints allow you to connect to AWS services using a private network instead of the public internet

  • They scale horizontally and are redundant

  • They remove the need for IGQ, NAT, etc, to access AWS services

  • Interface: provisions and ENI (private IP) as an entry point (select subnets)(must attach security group) - for most AWS services

  • Gateway: provisions a target and must be used in a route table which is associated with subnets S3 and DynamoDB

    • Needs region specified on the CLI because CLI default is us-east-1 with unspecified

  • In case of issues:

    • Check DNS setting resolution in your VPC

    • Check Route Tables

VPC Peering

  • Connect two VPC privately using AWS' network

  • Make them behave as if they were in the same network

  • Must not have overlapping CIDR

  • VPC Peering connection is not transitive (must be established for each VPC that needs to communicate with another)

  • Can do between accounts and regions

  • You must update route tables in each VPC's subnets to ensure instances can communicate

Flow Logs

  • Capture information about IP traffic going to your interfaces:

    • VPC Flow Logs

    • Subnet Flow Logs

    • Elastic Network Interface (ENI) Flow Logs

  • For ACCEPT and REJECT traffic

  • Helps to monitor & troubleshoot connectivity issues

  • Flow logs data can go into S3 (Athena) / CloudWatch Logs (Insights)

  • Captures network information from AWS managed interfaces too: ELB, RDS, ElastiCache, Redshift, WorkSpaces

Flow Log Syntax

  • [version, accountid, interfaceid, srcaddr, dstaddr, srcport, dstport, protocol, packets, bytes, start, end, action, logstatus]

  • 2 123456789010 eni-1235b8ca123456789 172.31.16.139 172.31.16.21 20641 22 6 20 4249 1418530010 1418530070 ACCEPT OK

  • Query VPC flow logs using Athena on S3 or CloudWatch Logs Insights

Bastion Hosts

  • Used to SSH into private instances

  • In the public subnet which is then connected to all private subnets

  • Bastion Host security must be tight

  • Exam tip: Make sure the bastion host only has port 22 from your ip, not even SG of your other instances

Site to Site VPN, Virtual Private Gateway, Customer Gateway

  • Virtual Private Gateway

    • VPN concentrator on the AWS side of the VPN connection

    • VGW is created and attached to the VPC from which you want to create the site-to-site VPN

    • Possbility to customize the ASN

  • Customer Gateway

    • Software application or physical device on customer side of the VPN connection

    • IP Address

      • Use the static, internet routeable, IP address of your customer gateway device

      • If the CGW is behind a NAT (with NAT-T), use the public address of the NAT

Direct Connect

  • Provides a dedicated private connection from a remote network to your VPC

  • Dedicated connection must be setup between your DC and AWS Direct Connect locations

  • You need to set up a Virtual Private Gateway on your VPC

  • Access public resources (S#) and private (EC2) on the same connection

  • Use cases:

    • Increase bandwidth throughput - working with large data sets - lower cost

    • More consistent network experience - application using real-time data feeds

    • Hybrid Environments

  • Supports both IPv4 and IPv6

b2ddd31533a2b7004550c7d36f225574.png

Direct Connect Gateway

  • If you want to set up a Direct Connect to one or more VPC in many different regions (no overlapping IPs)

0fa45a34dceb67f9dd79efc436f14259.png

Egress only IGW

  • Egress only IGW is for IPv6 only

  • Similar function as a NAT (GW), but a NAT is for IPv4

  • All IPv6 are public addresses

  • Therefore all instances are publicly accessible

  • Egress Only Internet Gatway gives out IPv6 instances access to the internet, but not reachable publicly

  • After creating an Egress Only IGW edit the Route Tables

VPC Summary

dbfa4967ade731a65a24076a6da23f3c.png

Last updated

Was this helpful?