SysOps Associate - Study Notes

Start stop pending states diagram etc

EC2 For Sysops

EC2 Changing Instance Type
- This only works for EBS backed instances
  - Stop the instance
  - Instance settings -> Change Isntance Type
  - Start Instance
- Some instance on change can switch to EBS-optimized, smaller instances cannot

EC2 Placement Groups

Cluster - clusters instances into a low-latency group in a single AZ
- Not available for T2 and other small, but most of larger instance sizes
Spread - spreads instances across underlying hardware (max 7 instances per group per AZ)
Partition - spreads instances across many different partitions (which rely on different sets of racks) within an AZ. Scales to 100s of EC2 instances per group (Hadoop, Cassandra, Kafka)
- Up to 7 paritions per AZ
- Instances in partition do not share racks with instances in the other partitions
- EC2 instances get info to partition information as metadata
- Can specify the partition, or go for auto-spread

EC2 Shutdown Behavior & Termination Protection

Shutdown behaviour (when does using the OS itself)
- Stopped: default
- Terminated
- Not applicable when done from AWS console or AWS API
- CLI Attribute: InstanceInititatedShutdownBehaviour
Termination Protection
- Enable termination protection: To protect against accidental termination in AWS Console or CLI
- Even if shutdown behaviour = Terminate, and termination protection is enabled, when shutting down instance from inside OS the instance will be terminated

Troubleshooting EC2 Launch Issues

InstanceLimitExceeded error: reached max number of instances per region (check service limits)
- Resolution: Launch in different region or request region limit increase
aws ec2 run-instances (now max is 32 vCPUs not 20 instances)
InsufficientInstanceCapacity: It means AWS does not have sufficient On-Demand capacity in the AZ
- Wait, or try less instances, or different instance type and upgrade later
If Instance Terminates Immediately (goes from pending to terminated)
- Reached EBS volume limit
- EBS snapshot is corrupt
- The root EBS volume is encrypted and you do not have permissions to access the KMS key for decryption
- The instance store backed AMI that you used to launch the instance is missing a required part (an image.part.xx file)
- To find exact reason, check Instance - Description tab, next to State Transition reason label (must add add instance attribute column)

Troubleshooting EC2 SSH Issues

chmod 400
Make sure the right username for the OS is used, or get "Host key not found" error
Connection timeout
- SG not properly configured
- CPU load of instance is too high

EC2 Instance Launch Modes

Dedicated Hosts
- Full control of EC2 instance placement
- Visibility into underlying sockets / physical cores of the hardware
- Allocated to you account for a 3 year period reservation

EC2 Instance Types Deep Dive

https://www.ec2instances.info

Cross Account AMI Copy

Why use a custom AMI
- Pre-installed packages were needed
- Faster boot time (no need for ec2 user data)
- Machine comes configured with monitoring / enterprise software
- Security concerns - control over the machines in the network
- Control the maintenance and updates of AMIs over time
- Active Directory Integration out of the box
- Installing your app ahead of time (for faster deploys in auto-scaling)
- Using someone else's AMI that is optimized for a specific app, DB, etc
Using public AMIs
- Can pay for other's AMIs by the hour
  - Could have optimized the software
  - Easy to run and configure
  - Essentially "rent" expertise from the AMI creator
- They can be found and published on the Amazon Marketplace

AMi Storage

They live in S3, and are charged accordingly (but are not visible)
By default AMIs are private, and locked for your account / region
You can make AMIs public and share them with other AWS accounts or sell them on the Marketplace

Cross account AMI Copy

Can share AMIs with another account
Sharing does not affect the ownership of the AMI
If you copy an AMI that has been shared with your account, you are the owner of the target AMI in your account
To copy, the owner of the source AMI must grant you read permissions for the storage that backs the AMI (either the associated EBS snapshot for an Amazon EBS-backed AMI, or and associated S3 bucket for an instance store-backed AMI)
You can't copy an encrypted AMI that was shared with you from another account. If the underlying snapshot and encryption key were shared with you, you can copy while re-encrypting with your own key, then register as a new AMI.
AMIs are based on Amazon Elastic Block Store (Amazon EBS) snapshots. For large file systems without a previous snapshot, creating an AMI can take several hours. To decrease the AMI creation time, first create an Amazon EBS snapshot before creating the AMI.

Elastic IPs

Elastic ip is a public IPv4 you own as long as you don't delete it
You can attach it to one instance at a time
You can remap it across instances
You don't pay for it if it's attached to a server
You pay for it if it's not attached to a server
Use to mask the failure of an instance or software by rapidly remapping the address to another instance in your account
You can only have 5 Elastic IP (increasable)
Overall try to avoid using
- You could use a random public IP and register a DNS name to it
- Or use a load balancer with a static hostname

CloudWatch

Cloudwatch metrics for EC2 (exam)

AWS Provider metrics (AWS pushes)
- Basic monitoring (default): 5 minutes interval
- Detailed monitoring (paid): 1 minute interval
- Include CPU, Network, Disk, Status Check metrics
  - CPU: CPU utilization + T2/3 Credit usage / balance
  - Network: Network In / Out
  - Status Checks
    Instance status: the EC2 VM (0 or 1 status)
    System status: the underlying hardware (0 or 1 status)
  - Disk: Read / Write for Operations / Bytes (only for instance store)
  - RAM not included by default
Custom Metrics (you push)
- Basic resolution: 1 minute resolution
- High resolution: Down to 1 second resolution
- Include RAM, application level metrics
- Make sure the IAM permissions on the EC2 instance role are correct
  - Can now attach Roles while instance is running, but still only 1 role per instance
  - cloudwatch.PutMetricData
  - cloudwatch.GetMetricStatistics
  - cloudwatch.ListMetrics
  - ec2.DescribeTags

Custom Metrics

Sample custom metrics
- RAM (cloudwatch monitoring scripts available)
- Swap usage
- App metrics (request per sec, etc)

Cloudwatch Logs for EC2 instances

By default no logs from your EC2 instances will go to CloudWatch
You need to run the CloudWatch agent on the EC2 server to push the log files you want
Make sure the IAM permissions are correct
- Configure Role
The CloudWatch log agent can be set up on-premises too
There is a new unified CloudWatch agent (check it out)
/etc/awslogs/awscli.conf
sudo start awslogsd

EC2 at Scale

Systems Manager (GO PLAY)
- Manage EC2 and on-premises at scale
- Get operational insights about the state of infrastructure
- Easily detect problems
- Patching automation for enchanced compliance
- Works for Windows and Linux
- Integrated with CloudWatch metrics / dashboards
- Integrated with AWS Config
- Free service
- Remember for exam:
  - Parameter Store
  - Run Command
  - Patch Manager
- Must install SSM agent onto systems we wish to control
- Installed by default with Linux AMI and some Ubuntu AMI
- If the instance can't be controlled via SSM it's probably an issue with the agent
- Make sure the EC2 instances have a proper IAM role to allow SSM actions (talk to SSM)

AWS Tags

Free naming, common tags are: Name, Environment, Team, Layer, etc
Used for:
- Resource grouping
- Automation
- Cost allocation

System Manager Resource Groups

Create, view, or manage logical groups of resources with tags
Allows creation of logical groups of resources such as
- Applications
- Different layers of an application stack
- Production versus dev environments
Regional Service
Works with EC2, S3, DynamoDB, Lambda, etc

SSM Documents

Documents can be in JSON or YAML
- Command
- Policy
- Automation
You define parameters
You define actions
Many documents already exist in AWS
They can act on State Manager, Patch Mgr, Automation, Run Command, and reference Parameter Store
Can execute them easily through Automation menu

SSM Run Command

Execute a document (script) or just run a command
Run command across multiple instances (using resource groups)
Rate control / Error control
Integrated with IAM & CloudTrail
No need for SSH
Results in the console

Using SSM to PATCH

Inventory -> List software on instance
Inventory + Run Command -> Patch Software
Patch Manager + Maintenance Window -> Patch OS
Patch Manager -> Gives you compliance
State Manager -> Ensures instances are in a cinsistent state (compliance)

SSM Session Manager

Allows you to start a secure shell on your VM
Does not use SSH access and bastion hosts
Only EC2 for now, but On-prem eventually
Log actions done through secure shells to S3 and CloudWatch Logs
IAM permissions: access SSM + write to S3 + write to CloudWatch
CloudTrail can intercept StartSession events
ssm-user not ec2-user
AWS secure shell vs. SSH
- No need to open port 22 at all
- No need for bastion hosts
- All commands are logged to S3 / CloudWatch (auditing)
- Access done through User IAM not SSH keys

Lost SSH Key

Traditional Method for EBS backed
- Stop, detach root voluem, attach to another instance
- modify the ~/.ssh/authoized_keys to append your new key, reattach
New Method for EBS backed
Run the AWSSupport-ResetAccess autmoation document in SSM
Instance store backed EC2
- You can't stop instance, or data is lost. AWS recommends just terminating and creating a new
- Pro-tip: Use Session Manager to secure shell to access and edit the ~/.ssh/authorized_keys file directly

Parameter Store

Secure storage for configuration and secrets
Optional seamless encryption using KMS
Serverless, scalable, durable, easy SDK, free
Version tracking of configurations / secrets
Configuration management using path & IAM
Notifications with CloudWatchEvents
Integration with Cloud formation
In a tree hierarchy
- Plaintext or Encrypted, uses KMS to unencrypt
- GetParameters or GetParametersByPath API
- aws ssm get-parameters --names xxxx

Load Balancing

Any LB (CLB, ALB, NLB) has a static hostname, use it and not underlying IP.
LB can scale, but not instantaneously, contact AWS for a "warm-up"
NLB directly see the client IP
4xx are client induced errors
5xx are application induced errors
- Error 503 means at capacity or no registered target
ALB does not support statIP
NLB gets one 1 static IP per subnet (**to get a static IP for ALB chain it behind a NLB)
NLB does not need pre-warming
NLB doesn't do SSL termination (except it does now)
Error Codes
- Unsuccessful at client side: 4xx
  - Error 400: Bad request
  - Error 401: Unauthorized
  - Error 403: Forbidden
  - Error 460: Client Closed connection
  - Error 463: X-forwarded For header had more than 30 IPs (simialr to malformed request)
- Unsuccessful at server side: 5xx
  - Error 500: Internal server error on ELB
  - Error 502: Bad Gateway
  - Error 503: Service unavailable
  - Error 504: Gateway timeout
  - Error : Unauthorized
Supporting SSL for Old Browsers (such as TLS 1.0)
- Change the policy to allow a weaker cipher
  - ELBSecurityPolicy-TLS-1-0-2015-04, there are others, note this one
Enable Deletion Protection

Load Balancers Monitoring

All LB metrics are directly pushed to CloudWatch metrics
- BackendConnections Errors
- Healthy/UnhealthyHostCount
- HTTPCode_Backend_2xx: successful count, 3xx redirected count, 4xx client erroror codes, 5xx server error codes generated by LB
- Latency
- RequestCount
- SurgeQueueLength the total number of requests that are pending routing to a healthy instance, max value 1024.
- SpilloverCount the total number of requests that were rejected because the surge queue is full

Load Balancers Access Logs

Access logs for LB can be enabled in attributes and stored in S3, they contain:
- Time
- Client IP
- Latencies
- Request paths
- Server response
- Trace Id
Only pay for S3 storage
Helpful for compliance
Helpful for keeping access data even after ELB or EC2 instances are terminated
Access logs are already encrypted

Application LB Request Tracing

Request tracing - Each HTTP request has added a custom header X-Amzn-Trace-Id
Example: X-Amzn-Trace-Id: Root=1-74628i123-asdberwer01234568123123
Useful in logs / distrbuted tracing platform to track a single request
Not yet integrated with X-Ray

AutoScaling and Group

Exam question: ASG is healthy, but EC2 instance behind ALB is not, change the ASG health check type from EC2/Instance to ELB so it picks up on its health checks
Look into CLI:
- set-instance-health in asg (to run tests)
- terminate-instance-in-autoscaling-group
Health Checks:
- EC2 Status checks
- ELB Health checks
Will not reboot unhealthy instances

Scaling Processes in ASG

Launch: Add a new EC2 to the group
Terminate: Remove an EC2 from the group
HealthCheck: Checks the health of instances
ReplaceUnhealthy: Terminate the unhealthy instances and recreate
AZRebalance: Balance the number of EC2 instances across AZs
- launch new instances and then terminate old
- If Terminate is suspended will grow up to 10% of its size, but could remain there because it can't terminate old
AlarmNotification: Accept notification from CloudWatch
ScheduledActions: Performs scheduled actions that you create
AddToLoadBalancer: Adds instances to the load balancer or target group
We can suspend these processes so that they cannot be used

Troubleshooting ASG

instances are already running. Launching EC2 instance failed.
- ASG has reached DesiredCapacity parameter limit, update it.
Launching EC2 instances is failing:
- The SG does not exist, may have been deleted.
- The key pair does not exist, may have been deleted.
If the ASG fails to launch an instance for over 24h, it will automatically suspend all the proccesses (administration suspension)

CloudWatch for ASG

Available for ASG (opt-in)
- GroupMinSize
- GroupMaxSize
- GroupDesiredCapacity
- GroupInServiceInstances
- GroupPendingInstances
- GroupStandbyInstances
- GroupTerminatingInstances
- GroupTotalInstances
You must enable metric collection to see these metrics
Metrics are collected each 1 minute

Monitoring the underlying EC2 via ASG

Basic monitoring: 5 minutes granularity
Detailed Monitoring: 1 minute granularity (paid)

Elastasic BeanStalk

BeanStalk is free, only pay for underlying instances
Managed Service
- Instance config / OS is handled by Beanstalk
- Deploymeny strategy is configurable but performed by BeanStalk
Only resonsible for code
Three Archtiecture models
- Single Instance: good for Dev
- LB + ASG: Great for production or pre-prod web applications
- ASG only: great for non-web in production (workers etc)
Has three components
Application
Application version: each deployment get assigned a version
Environment name: Dev, test, prod, free naming
Deploy application versions to environments and can promote application versions to next environment
Rollback feature to pervious version
Full control over lifecycle of environments

Deployment Options for Updates

All at once (deploy all at one go): Fastest but instances aren't available to serve traffic for the downtime
Rolling: Update a few instances at a time (bucket), and then move on to the next bucket once the first bucket is healthy
Rolling with additional batches: Like rolling but spins up new instances to move the batch (so always at max capacity)
Immutable: spins up new instances in a new ASG, deploys version to them, and then swaps all the instances when everything is healthy. Highest cost, quick rollback.

Blue/Green Deployment

Create a new "stage" env, deploy v2 there
New env (green) can be fully validated and roll back if issues
Route 53 can be set up using weighted policies to redirect traffic bit by bit to the new env
Using Beanstalk use "swap URLs" when done with env test

Beanstalk for SysOps

Beanstalk can put application logs directly into CloudWatch Logs
Can use custom domain: Route 53 Alias or CNAME on top of Beanstalk URL
Not responsible for patching the runtimes
On update of app resolving dependencies can take a long time, use Golden AMI (especially in combo with B/G for speed)
- Package OS dependencies
- Package OS depenencies
- Package company-wide software

Troubleshooting Beanstalk

If the health of your environment changes to red:
- Review environment events
- Pull logs to view recent log file entries
- Roll back to a previous, working version of the app
When accessing external resources, make sure the security groups are correctly configured
In case of command timeouts you can increase the deployment timeout value

CloudFormation

Update

Add, Modify Actions
Replacement = True (or not)

Mappings

Great when you know in advance all the values that can be taken, and they can be deduced from varibale such as Region, AZ, Account, Env (dev vs prod), etc
Allow safer control over the template
Use parameters when the values are really user specific
Use Fn::FindInMap to return a named value from a specific key
!FindInMap [MapName, TopLevelKey, SecondLevelKey]

Outputs

Best way to perform some cross stack collaboration, let each expert handle their own part of stack
- Fn::ImportValue the exported value (must have certain level of uniqueness)
You can't delete a CloudFormation Stack if its outputs are being referenced by another CloudFormation stack

Conditions (!Equals [ !Ref EnvType, prod])

Fn::And
Fn::Equals
Fn::If
Fn::Not
Fn::Or

Intrinsic Functions

Fn::Ref = !Ref
- Parameters -> returns the value of the parameter
- Resources -> returns the physcail ID of the underlying resource
Fn::GetAtt = !GetAtt
- Attributes can be attached to any resource you create (see docs)
Fn::FindInMap = !FindMap
- !FindInMap [ MapName, TopLevelKey, SecondLevelKey]
Fn::ImportValue = !ImportValue
- Import values that have been exported from toher templates
Fn::Join
- Join values with a delimiter
- !Join [ delimiter, [ comma-delimited list of values ] ]
- A🅱️c = !Join [ ":", [ a, b, c ] ]
Fn::Sub = !Sub
- Subsitute variables in a text, can combine with References or pseudovariables. Must contain ${VariableName}
- !Sub
- -- String
- -- {var1name: var1value, var2name: var2value }
Condition Functions (if not equals or and)

Cloudformation for SysOps

User Data in EC2

We can have user data at EC2 instance launch in Cloudformation
The important thing is to pass the entire script through the function Fn::Base64
Use pipe before script so all is treated as one, with linebreaks
User data script log is in /var/log/cloud-init-output.log

cfn-init

Alternate way to do User Data instance stuff
AWS::CloudFormation::Init must be in Metadata of a resource, defines in metadata what and how to install
With the cfn-init script, it helps make complex EC2 configurations readable
The EC2 instance will query the CloudFormation service to get the init data
Logs go to /var/log/cfn-init.log

cfn-signal & wait conditions

Still can't tell CloudFormation EC2 was proerply configured after a cfn-init
For this we us a cfn-signal script
- We run cfn-signal right after cfn-init
- Tell CloudFormation service to keep on going or fail
We need to define a WaitCondition in resource (polled by cfn-signal) (AWS:CloudFormation::WaitCondition)
- Block the template until it receives a signal from cfn-signal
- We attach a CreationPolicy (works on EC2 and ASG)

Wait condition didn't receive the required number of signals from EC2 instance

Ensure the AMI has CloudFormation helper scripts (can DL)
Verify that cfn-init & cfn-signal command ran successfully, view logs /var/log/cloud-init.log or /var/log/cfn-init.log
Can retrieve logs by logging onto instance, but must disable rollback onf ailure or else instance is deleted
Verify instance has a connection to internet (public IGW or NAT) otherwise can't connect to CloudFormation
- Can test with curl -l http://aws.amazon.com

Rollback on failures

Stack Creation fails: (CreateStack API) - Stack Creation Options
- Default: everything rolls back (gets deleted)
  - OnFailure=ROLLBACK
- Troubleshoot: Option to disable rollback to manually troubleshoot
  - OnFailure=DO_NOTHING
- Delete: Get rid of stack entirely, don't keep anything
  - OnFailure=DELETE
Stack Update Failes: (UpdateStack API)
- The stack automatically rolls back to the previous known working state
- Ability to in logs what happened

Nested Stacks

Nested stacks are stacks as part of other stacks
They allow you to isolate repeated patterns / common components in separate stacks and call them from other stacks
Considered best practice
To update a nested stack always update the parent (root stack)
Resource -> Type -> AWS::CloudFormation::Stack, TemplateURL

ChangeSets

When you updatae a stack you need to know what changes before it happens for greater confidence
ChangeSets won't say if the update will be successful though
Create Change set -> View Change set -> (optional) Create additional change sets -> Execute Change set
See changesets in Stacks menu on left or
Actions on stack create changeset

Retaining Data on Deletes

You can put a DeletionPolicy on any resource to control what happens when the CloudFormation template is deleted (in resource definition)
DeletionPolicy=Retain
- Specify on resources to preserve/backup in case of CloudFormation deletes
- To keep a resource, specify Retain (works for any resource/nested stack)
DeletionPolicy=Snapshot
- Will take a snapshot before deleting resource
- EBS Volume, ElastCache/Cluster, ReplicationGroup
- RDS DBInstance, RDS DBCluster, Redshift Cluster
DeletionPolicy=Delete (default)
- Note: for AWS::RDS::DBCluster resources the default is snapshot
- Note: to delete an S3 bucket you need to first empty the bucket

EFS & EBS

EBS Volume

EC2 loses root volume when manually terminated
Unexpected terminations might happen (AWS alerts via email)
EBS volume is a network drive you can attach to your instances while they run, to persist data
Can be latency due to network, can be detached and attached quickly
Provisioned capacity GBs and IOPs
- Billed for all provisioned capacity
- Can increase drive over time, start small
Characterized in Size | Throughput | IOPS
Only GP2 and IO1 can be boot volumes
lsblk

EBS Volumes Types

GP2 (SSD): General purpose SSD (balance price/perf)
- Boot volumes, virtual desktops, low-latency interactive apps, development and test
- 1GB-16TB
- Small GP2 can burst IOPS to 3000 (anything under 3k can burst to 3k)
- Max IOPS is 16000
- 3 IOPS per GB, means at 5334 GB at max IOPS
IO1 (SSD): Highest-perf, low latency or high-throughput
- Critical business apps that require sustained IOPS, or more than 16000
- Mongo, Cassandra, MSSQL, MySQL, Oracle
- 4GB-16TG
- IOPS is provisioned 100-64000 (64k for Nitro only) else 100-32000
- Maximum ratio of provisioned IOPS to volume GB size = 50:1
ST1 (HDD): Low cost for frequently accessed, throughput-intensive workloads (big data)
- Streaming workloads requiring consistent, fast throughput at low price
- Big Data, DW, log processing, Kafka
- Cannot be boot volume
- 500GB - 16TB
- Max IOPS is 500
- Max throughput of 500 MB/s, can burst
SCI (HDD): Lowest cost for less frequently accessed workloads
- Throughput oriented for large volumes of data infrequently accessed
- Where lowest cost is important
- Cannot be a boot volume
- 500Gb - 16TG
- Max IOPS is 250
- Max throughput of 250 MB/s, can burst

GP2 volumes I/O Burst

IF your gp2 volume is less than 1000GB (IOPS less than 3000) it can burst to 3000 (no burst over 1000GB)
Accumulate burst credit over time
Bigger your volume, faster you fill up your "burst credit balance"
What happens if I/O credit is empty?
- The max I/O becomes the baseline you paid for
- If you see balance at 0 all the time you should increase your volume size or switch tio IO1
- Use CloudWatch to monitor the I/O credit balance
Burst also applies to ST1 or SC1 (for increase in throughput)

Computing MB/s based on IOPS

gp2
- Throughput in MB/s = (Volume size in GB) * (IOPS per GB) * I/O size in KB)
- 100GB * 3 IOPS * 256KB per I/O operation = 75MB/s
- Limit to a max of 250MB/s (means volume >= 334GB won't increase throughput)
IO1
- **Throughput in MB/s = (Provisioned IOPS) * (I/O size in KB)
- 1000 IOPS * 256KB = 250MB/s
- Throughput limit of IO1 is 256KB for each IOPS provisioned
- Limit to a max of 500MB/s (at 32k IOPS) and 1000MB/s (at 64k IOPS)

EBS Volume Resizing

Can do on the fly (no stop of instance)
Can only increase volume size (any volume type)
Can change volume type
Can increase IO1 IOPS
After resizing need to repartition your drive
After increasing the size the volume will be in "optimisation" phase for a while, but less perf (in-use - modifying/optimizing)

EBS Snapshots

Incremental - only changed blocks
EBS backups use IO, should not run them during peak times
Snapshots are stored in S3 (but you won't see them)
Don't have to detach volume but recommended
Max 100000 snapshots
Can copy across AZ or Region
Can make AMI from Snapshot
EBS volumes restored by snapshots need to be pre-warmed (using fio or dd to read entire volume)
Can be automated using Amazon Data Lifecycle Manager

EBS Migration

Volumes locked to AZ
To migrate, snapshot, (optional) copy volume to different region
Create a volume from the snapshot in the AZ of your choice

EBS Encryption

When you encrypt an EBS volume you get:
- Data at rest is encrypted inside the volume
- Data in flight between instance and the volume is encrypted
- Snapshots are encrypted
- As are volumes created from the snapshot
Encryption and decryption are transparent
Minimal impact on latency
EBS Encryption leverages keys from KMS (AES-256)
Copying an unencrypted snapshot allows encryption
Snapshots of encrypted volumes are encrypted
Encrypting an undecrypted EBS volume
- Create an EBS snapshot of the volume
- Encrypt the snapshot using copy
- Create a new volume from the snapshot
- Attach encrypted volume to original instance

EBS RAID

EBS is already redundant (replicated within an AZ)
But for increase of IOPS past max
Must do in OS not AWS
Or mirror EBS volumes
- RAID 0 (Perf, get combined disk space, IO, throughput, not fault tolerant)
- RAID 1 (mirror, send data to two* volumes at same time, 2x network traffic)
- RAID 5, 6 (Not recommended for EBS)

EBS for SysOps

If you plan to use the root volume of an instance after it's terminated
- Set the Delete on Termination flag to "no" (when creating the EC2 instance)
If you use EBS for high eprformance, use EBS-optimized instance types
If an EBS volume is unused you still pay for it
For cost savings over a longer period, snapshot volume and restore later needed (3x savings)

EBS Troubleshooting

**High wait time or slow response for SSD -> icnrease IOPS (or go with Provisioned IOPS on IO1)
EC2 won't start with EBS volume as root: make sure volume names are properly mapped (/dev/xvdb instead of /dev/xvda for example)
After increasing a volume size, you still need to repartiion to use the incremental storage (xfs_growfs for example)

CloudWatch and EBS

Important EBS Volume metrics
- VolumeIdleTime: number of seconds when no read / write is submitted
- VolumeQueueLength: number of operations waiting to be executred. High number means an IOPS or application issue
- BurstBalance: if it becomes 0 we need a volume with more IOPS
GP2 volume types: 5 minute interval
IO1 volume types: 1 minute interval
EBS volumes havea status check:
- Ok - volume is performing well
- Warning - performance is below expected
- Impaired - Stalled, performance severely degraded
- Insufficient-data - metric data collection in progress

EFS

Managed NFS
EFS works with EC2 instances multi-AZ
Highly available, scalable, expensive (3xGP2), pay per use
For: content management, web serving, data sharing, WordPress
NFS v4.1
Use security groups to control access (on network drive)
Compatible with Linux based AMI (not windows)
Performance mode: General purpose (default), Max IO (used when 1000's of EC2 are using the EFS)
Has bursting or provisioned modes for IO
"EFS file sync" to sync from on-prem fs to EFS
Backup EFS-to-EFS (incremental, can choose frequency)
Encryption at rest using KMS
EFS now has lifecycle mgmt. to tier to EFS IA
Can use TLS for EFS

Instance store

Some instance do not come with root EBS
Ephemeral
Physically attached to your instance
Pros
- Better I/O perf
- Good for buffer / cache / scratch data / temporary content
- Data survives reboot
Cons
- On stop or termination instance store is lost
- Can't resize the instance store
- Backups must be operated by the user

S3

Bucket names must be globally unique
- Global at top menu, (but regional service)
Minimum of 3 and maximum of 63 characters - no uppercase or underscores
Must start with a lowercase letter or number and can’t be formatted as an IP address (1.1.1.1)
Default of 100 buckets per account, and hard 1000 bucket limit via support request
Unlimited objects in buckets
Unlimited total capacity for a bucket
An object’s key is its name (FULL PATH including slashes and filename, but not bucket name)
An object’s value is its data (content)
An object’s size is from 0kb to 5TB (more than 5GB must use multi-part upload)
- To upload a file larger than 160 GB, use the AWS CLI, AWS SDK, or Amazon S3 REST API.
Metadata (list of key/value pairs, system or user metadata)
Tags (Unicode key/value pair -max 10-), useful for security / lifecycle
Version ID (if versioning is enabled)

Versioning

Bucket level setting
If you overwrite a key/file you increment its version
Best practice to version your buckets
- Protect against unintended deletes
- Easy roll back to previous version
Any file that is not versioned prior to enabling versioning will have a version NULL
Deleting a file only adds a delete marker

S3 Websites

URL can be
- .s3-website-.amazonaws.com
- .s3-website..amazonaws.com

S3 CORS

If you request data from another S3 bucket you need to enable CORS
Cross Origin Resource Sharing allows you to limit the number of websites that can request files in your S3 (help limit costs)
Access-Control-Allow-Origin:

S3 Consistency Model

Amazon S3 provides read-after-write consistency for PUTS of new objects in your S3 bucket in all regions with one caveat. The caveat is that if you make a HEAD or GET request to the key name (to find if the object exists) before creating the object, Amazon S3 provides eventual consistency for read-after-write.

S3 Security

User based
- IAM Policies - which API calls should be allowed for a specific user from IAM
Resource Based
- Bucket Policies - bucket wide rules from the S3 console - allows cross account
- Object ACLs - finer grain, not super popular
- Bucket ACLs - less common

S3 Bucket Policies

Grant public access to the bucket
Force objects to be encrypted at upload
Grant access to another account (Cross account)
JSON based (4 components)
- Resources: buckets and objects
- Actions: Set of APIs to Allow or Deny
- Effect: Allow or Deny
- Principal: The account of user to apply the policy to
Networking: Supports VPC endpoints (for instances in VPC with no internet)
Logging and Auditing: S3 access logs can be stored in another bucket, API calls can be logged in CloudTrail
User Security: MFA can be required in versioned buckets to delete objects, Signed URLs = valid for a limited time (ex: premium video service for time)

S3 Encryption for Objects

Can also set default encryption for bucket

SSE-S3

Keys handled and managed by AWS S3
Object is encrypted server side, sent via HTTP/S
AES-256
Must set header: "x-amz-server-side-encryption":"AES256"
S3 Managed Data Key + Object > Encrypted

SSE-KMS

Keys handled and managed by KMS
Object is encrypted server side, sent via HTTP/S
KMS advantages: user control (rotation etc.) + audit trail
Must set header: "x-amz-server-side-encryption":"aws:kms"
KMS Customer Master Key (CMS) + Object > Encrypted

SSE-C

Server Side encryption using keys fully managed by customer outside AWS
S3 does not store the key
HTTPS must be used
Encryption key is provided (sent) in HTTP header, in every request
Client provided data key + Object > Encrypted, S3 throws away key

Client Side Encryption

Client library such as Amazon S3 Encryption Client
Clients must encrypt data themselves before sending to S3
Client must decrypt data themselves when retrieving from S3
Customer fully manages the keys and encryption cycle

Encryption in Transit

AWS S3 exposes both HTTP and HTTPS endpoints, HTTPS recommended

Default Encryption vs Bucket Policies

Old way was to use bucket policies to enable and to refuse any HTTP command without proper headers
New way is to click "default encryption" option in S3
Bucket Policies are evaluated before default encryption
Either SSE-S3 (AES-256) or SSE-KMS

S3 MFA Delete

To use MFA-Delete must enable Versioning on the S3 bucket
You need MFA to
- permanently delete an object version (can do marker without)
- suspend versioning on the bucket
You won't need it for
- enabling versioning
- listing deleted versions
Only bucket owner (root account) can enable/disable MFA-delete
Can only be enabled using the CLI

S3 Access Logs

Any request made to S3 from any account, authorized or denied, will be logged to another S3 bucket
Can analyze using data analysis tools (Hive, Athena, etc.)
Log format in docs

S3 Cross Region Replication

Must enable versioning (source and destination)
Must be in different regions (duh)
Can be different accounts
Copying is asynchronous
Must give proper IAM permissions to S3, needs Role
For:
- Compliance, lower latency access, cross account replication
Can do based on whole bucket, prefix, tags
Can replicate encrypted if other account has access to KMS key
Can change storage class or ownership

S3 Pre-signed URLs

Can create a pre-signed URL via CLI or SDK
- For downloads CLI
- For uploads SDK
Valid by default for 3600 seconds, change with --expires-in [TIME_BY_SECONDS]
Users who receive pre-signed URL inherit permissions of the generator for GET/PUT
aws configure set default.s3.signature_version s3v4 (make URL KMS compatible)
aws s3 presign s3://bucketname/file.jpg --expires-in 300 --region ca-central-1 (S3 is global but bucket is regional)
Avoids direct access to the bucket from users

S3 Inventory

S3 Inventory helps manage your storage
Audit and report on the replication and encryption status of your objects
Use: Business, compliance, regualtory needs
Query with Athena, Redshift, Presto, Hive, etc
Can set up multiple inventories
Data goes from a source bucket to a target bucket (need to set up policy to place data, done automatically)

S3 Storage Tiers

S3 Standard - General Purpose
99.999999999% Durability (10 mil objects 10k years, lose 1)
99.99% availability
Can sustain 2 concurrent AZ loss
S3 Reduced Redundancy Storage (RRS)
- Deprecated
- 99.99% durability and availability
- Can sustain loss of single AZ
- Use for non-critical reproducible data
S3 Standard Infrequent Access (IA)
- Suitable for data less frequently access but requires rapid retrieval
- Retrieval fee
- 99.999999999% Durability (10 mil objects 10k years, lose 1)
- 99.99% availability
- Can sustain 2 concurrent AZ loss
- For backups, DR, etc.
S3 One Zone Infrequent Access
- Same as IA, but data is stored in a single AZ
- Retrieval fee
- 99.999999999% Durability; data is lost when AZ is destroyed
- 99.95% availability
- Lower cost by 20% than IA
- For secondary backup data, or recreatable
S3 Intelligent Tiering
- Small monthly auto-tiering fee
- Move between S3 and IA based on access patterns
- 99.999999999% Durability, 99.9% availability
- Can sustain single AZ loss
S3 Glacier
- Alternative to Tape (10's of years)
- 99.999999999% Durability
- Cost per estorage per month ($0.004 / GB) + Retrieval fee (-10x than S3)
- Each item is called an "Archive", up to 40TB size
- ARchives are stored in "Vaults", similar to bucket
- Retrieval options:
  - Expedited (1-5 mins) - $0.03 / GB and $0.01 per request (need to buy capacity units to use)
  - Standard (3-5 hours) - $0.01 per GB and 0.05 per 1000 requests
  - Bulk (5-12 hours) - $0.0025 per GB and $0.025 per 1000 requests

Glacier Operations

Upload - Single operation or by parts (MultiPart upload) for larger archives
Download - First initiate a retriveal job for the particular archive, Glacier the nprepares it for download. User then has a limited time to download the data from staging server
Delete - Use Glacier REST API or AWS SDK by specifying archive ID
Restore links have an expiry date

Glacier Vault Policies & Vault Lock

Vault is a collection of archives
Each Vault has:
- ONE Vault access policy
- One Vault lock policy
Vault Policies are written in JSON
Vault Access Policy is similar to bucket policy (restrict user / account permissions)
Vault Lock Policy is a policy you lock, for regulatory and compliance requirements
- **The policy is immutable, it can never be changed
- ex: forbid deleting an archive if less than 1 year old
- ex: Implement WORM policy (write once, read many)

S3 Lifecycle Rules

Transition Actions: Defines when objects are transitioned to another storage class
Expiration Actions: Objects expire and are deleted
Can be used to delete incomplete multi-part uploads
Limit to prefix or tag
Can do current or previous versions

S3 Analytics - Storage Class Analysis (in s3 mgmt)

You can set up analytics to help determine when to transition objects from Standard to Standard_IA
Does not work for ONCEZONE_IA or GLACIER
Report is updated on daily basis
Takes about 24h to 48h to first start
Help you put together efficient Lifecycle Rules

Glacier

Snowball

Physically transport data in or out of AWS
How much usable
TB or PB
Alternative to network fees
Secure, tamper resistant, uses KMS 256
Tracking using SNS and text messages, E-Ink shipping label
For: large data migrations, DC decommission, disaster recovery
If it takes more than a week via network use Snowball instead
Has client for copying files

Snowball Edge

Adds computational capability
100TB capacity, either:
- Storage Optimized - 24 vCPU
- Compute Optimized - 52 vCPU & optional GPU
- Supports a custom EC2 AMI so you can process while transferring
- Supports custom Lambda functions

AWS Snowmobile

Transfer exabytes (1EB = 1000PB = 1000000TB)
Each has 100PB of capacity, can use multiple in parallel
Use if transferring more than 10PB

Storage Gateway

Expose S3 on-premises
File Gateway
- S3 buckets via NFS and SMB (all S3 modes)
- Bucket access using IAM roles for each File Gateway
- Recently used data is cached
- Can be mounted on many servers
Volume Gateway
- Block storage using iSCSI backed by S3
- ^ Backed by EBS snapshots
- Cached volumes: low latency access to most recent data
- Stored volumes: entire dataset is on-premises, scheduled backups to S3
Tape Gateway
- VTL Virtual Tape Library backed by S3 and Glacier
- Back up data using existing tape based processes (and iSCSI interface)
- Works with most backup softwares

S3 For SysOps

S3 Versioning

S3 Versioning creates a new version each time you change a file
That includes when you encrypt a file (good against crypto-ransom)
Deleting a file in the S3 bucket just adds a delete marker on the versioning (delete marker has 0 size)
To delte a bucket you need to remove all the file versions within it

CloudFront

CloudFormation

We didn’t specify a name in the json file for this bucket, so AWS names it with the [STACKNAME]-[LOGICAL_VOLUME_NAME]-[RANDOM_STRING] format.
What is logical volume name, based on resource in CFN?
Stacks have logical resources in them that create physical resources

CloudFront

Cached at edge locations
Popular with S3 but works with EC2 and LB as well
Helps with network attacks
Provides SSL (HTTPS) via ACM
Can use SSL (HTTPS) to talk internally to applications
Supports RTMP
Origin Access Identity
- Limit S3 to be only accessed via this identity

CloudFront Access Logs

Logs every request made to CloudFront into a logging S3 Bucket

Can generate reports on:
- Cache Stats
- Popular Objects
- Top Referrers
- Usage Reports
- Viewers Reports
These reports are based on data from the Access Logs but you don't need to enable logs to get the reports

CloudFront Troubleshooting

CloudFront caches HTTP 4xx and 5xx status codes returned by our S3 (or the origin server)
5xx error indicates Gateway issues

// May not be on exam

CloudFront Signed URL / Signed Cookies

To distrbute paid shared content which lives in S3
If S3 can only be accessed via CloudFront we can't use self-signed S3 URLs
Can attach a policy with:
- URL expiration
- IP ranges for access
- Trusted signers (which AWS Account can create signed URLs)
CloudFront signed URLs can only be created using the AWS SDK
Validity length?
- Share content, movies etc, short = few minutes
- Private content (to user) longer = years

CloudFront vs S3 Cross Region Replication

CloudFront
Global Edge network
Files are cached for a TTL (maybe a day)
Great for static content that must be available everywhere
S3 Cross Region Replication
- Must be set up for each region
- Files are updated near real-time
- Read only
- Great for dynamic content that needs low-latency in a few regions

CloudFront Geo Restriction

Restrict who can access your distribution
- Whitelist by country
- Blacklist by country
Country is determined by usnig 3rd party Geo-IP database
Copywrite law etc.

done //

Athena

Serverless service to perform analytics directly against S3 files
Uses SQL to query
Has a JDBC / ODBC driver
Charged per query and amount of data scanned
Supports CSV, JSON, ORC, Avro, and Parquet
For: BI, analytics, reporting, analyze VPC vlow logs, ELB logs, CloudTrail trails, etc.

Databases

RDS

Postgres
Oracle
MySQL
MariaDB
MS SQL
Aurora (proprietary)

DB Identifier (name) must be unique across region
Your responsibility
- Check IP / Port / SG inbound rules
- In-database user creation and permissions
- Creating database with or without public access
- Ensure parameter groups or DB is configured to only allow SSL
AWS Responsibility
- No SSH access
- No manual DB patching
- No Manual OS patching
- No way to audit underlying instance

// Not on Exam

For SAs

Read replicas can only do SELECT
RDS supports Transparent Data Encryption for Oracle or SQL Server
- Is on top of KMS, may affect performance
IAM Authentication vs un/pw for MySQL and PostgreSQL
- Lifespan of an IAM authentication token is 15 mins (short-lived), better security
- Tokens are generated by IAM credentials
- SSL must be used (or connection refused)
- Easy to use EC2 Instance Roles to connect to RDS DB (so don't need DB credentials in actual instance for non IAM)

Done //

Managed Service =
- OS patching
- Point in Time Restore backups
- Monitoring dashboards
- Read replicas for read perf
- Multi AZ set for DR
- Maintenance windows for upgrades
- Scaling (vert and horiz)
- BUT no SSH
- No audit of underlying instance

RDS Read Replicas for scalability

Up to 5 Read Replicas
Within AZ, Cross AZ, or Cross Region
Replication is ASYNC (eventually consistent)
Replicas can be promoted to their own DB
Applications must updated connection string to leverage read replicas
- One string for master, 1 for each replica

Can combo Read Replicas and DR Multi AZ

RDS Multi AZ (Disaster Recovery)

SYNC replication
One DNS name for auto failover to standby
Increases availability (duh)
For AZ loss (not cross region)
No manual intervention
Not for scaling

RDS Multi AZ vs Read Replicas

Multi AZ
- Multi AZ is not used to support reads
- The failover happens only in the following conditions
  - The primary DB instance fails
  - An AZ outage
  - The DB instance server type is changed
  - The OS of the DB instance is undergoing software patching
  - A manual failover of the DB instance was inititiated using Reboot with failover
- No failover for DB operations: long-running queries, deadlocks, or DB corruption errors
- Endpoint is the same after failover (no URL change in app)
- Lower maintenance impact. AWS does maintenance on the standby, which is then promoted to Master
- Backups are creeated from the standby (less impact, normally done on master)
- Only within in a single region, region outage impacts availability
Read Replicas
- Help scaling read traffic
- A Read Replica can be promoted as a standalone database (manually)
- Read Replicas can be within AZ, Cross AZ, or Cross Region
- Each Read Replica has its own DNS endpoint
- You can have Read Replicas of Read Replicas
- Read Replicas can be Multi-AZ
- Read replicas help with DR by using Cross Region RR
- Read Replicas are not supported for Oracle
- Read Replicas can be used to run BI/Analytics reports etc

DB Parameter Groups

You can configure the DB engine using Parameter Groups
Dynamic Parameters are applied immediately
Static parameters are applied after instance reboot
You can modify the parameter group associated with a DB (replace with your own custom) (must reboot)
Must know
- PostgreSQL / SQL Server: **red.force_ssl=1 -> force SSL connections
- MySQL / MariaDB: GRANT SELECT ON mydatabase.* TO 'myuser'@'%' IDENTIFIED BY '...'' REQUIRE SSL;

RDS Backups

Automatically enabled
Automated Backups
- Daily full snapshot of DB
- Captures transaction logs in real
  - Ability to restore to any point in time
- 7 days retention (can increase to 35) (can lower as well)
DB Snapshots (can be manually triggered)
- Retention for as long as you want (keep specific state, or long term)

Backup vs Snapshots

Backups

Backups are "continuous" and allow point in time recovery
backups happen during maintenance windows
When you delete a DB instance, you can retain automated backups
Backups have a retention period you set between 0 and 35 days (so they're all time limited)

Snapshots

Snapshots use IO operations and stop the DB from seconds to minutes
Snapshots taken on a Multi AZ DB don't imact master, just the standby
Snapshots are incremental after the first snapshot (which is full)
You can copy & share snapshots
Manual snapshots don't expire
You can take a "final snapshot" when you delete your DB

RDS Encryption

Encryption at rest with AES KMS - AES256 encryption
- Only at creation
- or: snapshot, copy as encrypted, create DB from snapshot (same as EBS)
SSL certificates to encrypt data in flight
To enforce SSL:
- PostgreSQL: rds.force_ssl=1 in the AWS RDS console (parameter groups)
- MySQL: Within the DB: GRANT USAGE ON . TO 'mysqluser'@'%' REQUIRE SSL;
To connect using SSL:
- Provide SSL Trust certificate (can be downloaded from AWS)
- Provide SSL options when connecting to DB

RDS Security

Encryption is on done on DB creation or do snap copy encrypt create DB
RDS DB are usually deployed in private subnet
Security works by leveraging security groups for who can communicate with it
IAM policies help control who can manage RDS
Traditional username and password to log into DB itself
IAM users now works with Aurora/MySQL

RDS API for SysOps

DescribeDBInstances API
- Helps to get a list of all DB instances, including Read Replicas
- Helps to get DB version
CreateDBSnapshot API - Make a snapshot
DescribeEvents API - Helps to return information about events related to your DB instance
RebootDBInstance API - Helps to initiate a "forced" failover by rebooting DB instance

RDS with CloudWatch

Cloudwatch Metrics associated with RDS (gathered from hypervisor)
- DatabaseConnections
- SwapUsage
- RadIOPS/WriteIOPS
- ReadLatency / WriteLatency
- ReadTrhoughPut / WriteThroughPut
- DiskQueueDepth
- FreeStorageSpace
Enhanced Monitoring (gathered from agent on DB instance)
- Useful when you need to see how many different processes or threads use the CPU
- Access to over 50 new CPU, memory, file system, and disk I/O metrics
- 1-60 secs granularity

RDS Performance Inisghts

Visualize your DB performance and analyze any issues that affect it
With Perf Insights dashboard you can visualize the DB load and filter load by:
- By Waits -> find the resource that is the bottleneck (CPU, IO, lock, etc)
- By SQL statements -> find the SQL statement that is the problem
- By Hosts -> find the server that is using the DB the most
- By Users -> find the user that is using the DB the most
DBLoad - the number of active sessions for the DB engine
SSQL queries that are putting load on your DB (it's own category in dashboard)
Not supported on T2 instances

RDS vs. Aurora

Proprietary
Postgres and MySQL drivers supported
Cloud optimized - 5x perf for MySQL, 3x perf for Postgres
Automatically grows in increments of 10GB up to 64TB
Aurora can have 15 replicas, MySQL only 5, and replication is faster (sub 10ms lag)
Failover in Aurora is instantaneous, HA native.
Aurora costs 20% more than RDS, but is more efficient.

Aurora

Automatic failover
Backup and recovery
Isolation and security
Industry compliance
Push-button scaling
Automated patching with zero downtime
Advanced monitoring
Routine maintenance
Backtrack: restore data at any point in time without backups
HA and Read Scaling
- 6 Copies of data across 3 AZ
  - 4 copies out of 6 needed for writes
  - 3 copies out of 6 needed for reads
  - Self healing with peer-to-peer replication (for corrupted data)
  - Storage is striped across 100's of volumes
- One Aurora instance takes writes, Master
- Automated failover for master in less than 30 secs
- Master + up to 15 Read Replicas serve reads (any replica can become master)
- Support for Cross Region Replication
Shared logical storage volume across AZs for Replication + Self-Healing + Auto Expanding
Master is only writer
- Writer Endpoint (DNS name) always points to current master, for failover
- Read Replicas can do auto-scaling
  - Reader Endpoint Connection load balancing for reads, across all scaled instances. Happens at connection level not statement level.
  - ![Screen Shot 2019-11-18 at 14.10.27.png](../../../../_resources/Screen Shot 2019-11-18 at 14.10.27.png)

Aurora Security

Encryption at rest using KMS
Automated backups, snapshots and replicas are also encrypted
Encryption in flight using SSL (same process as MySQL or Postgres)
Authentication using IAM
You are responsible for protecting via SG
No SSH

Aurora Serverless

No need to choose an instance size
Only supports MySQL 5.6 & Postgres in beta
Helpful when you can't predict workload
DB cluster starts, shuts down, and scales automatically based on CPU / connections
Can migrate from Aurora Cluster to Serverless and vice versa
Serverless usage is measured in ACU (Aurora Capacity Units)
Billed in 5 minute increments of ACU
Some features aren't supported in serverless, so check docs

Aurora for SAs

Can use IAM for Aurora
Aurora Global Databases span multiple regions and enable DR
- One primary region
- One DR Region
- The DR region can be used for lower latency reads
- < 1 sec replication lag on average
If not using Global Databases you can create cross region Read Replicas
- FAQ recommends Global Databases instead

Elasticache

Managed in-memory DB, high perf, low latency.
Redis or Memcached
Reduce load on DB
Make app stateless (keep state in cache)
Write scaling using Sharding
Read scaling using Read Replicas
Multi AZ with Failover
AWS takes care of all normal stuff
App queries ElastiCache, either gets cache hit or cache miss, in case of miss it gets cached for hit next time (by application)
Cache must come with invalidation strategy for only most current data (app based)
User session store (keep it stateless)
- Application writes session data into ElastiCache
- User hits a different application instance
- Instance retrieves the data from cache to keep session going
Redis
- In-memory key-value store
- Super low latency (sub ms)
- Cache survives reboot by default (persistence)
- Multi AZ with automatic failover for DR (if you want to keep cache data)
- Support for Read Replicas and Cluster
- Good for: User sessions, Leaderboard (has a sort), Distributed states, Relive pressure on DB, Pub / Sub capability for messaging
Memcached
- In-memory object store
- Cache does not survive reboots
- Good for: Quick object retrieval, cache often accessed objects

ElastiCache for SAs

Security
- Redis supports RedisAUTH (un/pw)
- SSL in-flight must be enabled and used
- Memcached supports SASL
- None support IAM
- IAM policies are used only for AWS API level security
Patterns for ElastiCache
- Lazy Loading: all read data is cached, can become stale
- Write Through: Adds or updates data in the cache when written to DB (no stale data)
- Session Store: stores temp session data (using TTL features maybe)

Monitoring, Audit, and Performance

CloudWatch

CloudWatch provides metrics for every service in AWS
Metric is a variable to monitor (CPUUtilization, NetworkIn, etc)
Metrics belong to namespaces
Dimension is an attribute of a metric (instance id, environment, etc)
Up to 10 dimensions per metric
Metrics have timestamps
Can create a CloudWatch dashboard of metrics

Detailed Monitoring

EC2 instance metrics have metrics every 5 minutes
With detailed monitoring (for a cost) you get data every 1 minute
Use detailed monitoring for more effective ASG scaling
Free Tier allows up to 10 detailed monitoring metrics
EC2 Memory usage is not pushed by default, msut be pushed from inside the instance

CloudWatch Custom Metrics

Possibility to define and send your own custom metrics to CloudWatch
Ability to use dimensions (attributes) to segment metrics
- Instance.id
- Environment.name
Metric resolution:
- Standard: 1 minute
- High resolution: Down to 1 second (StorageResolution API parameter) - Higher Cost
- Use API call PutMetricData
- Use exponential back off in case of throttle errors
Available metrics
- ASGAverageCPUUtilization—Average CPU utilization of the Auto Scaling group.
- ASGAverageNetworkIn—Average number of bytes received on all network interfaces by the Auto Scaling group.
- ASGAverageNetworkOut—Average number of bytes sent out on all network interfaces by the Auto Scaling group.
- ALBRequestCountPerTarget—Number of requests completed per target in an Application Load Balancer target group.

CloudWatch DashBoards (exam)

Great way to set up dashboards for quick access to key metrics
Dashboards are global, go to each region to set up, but see anywhere
Dashboards can include graphs from different regions
You can change the time zone & time range of the dashboards
You can set up automatic refresh (10s, 1m, 2m, 5m, 15m)
Pricing:
- 3 Dashboards (up to 50 metrics) for free
- $3/dashbaord/month afterwards

CloudWatch Logs

Applications can send logs to CloudWatch via the SDK
CloudWatch can collect logs from:
- Elastic Beanstalk: Collects from application
- ECS: Colelcts from containers
- Lambda: Collects from functions
- VPC Flow Logs
- API Gateway
- CloudTrail based on filter
- CloudWatch Logs Agents: For example on EC2 machines
- Route53: Logs DNS queries
CloudWatch logs can go to:
- Batch exporter to S3 for archival
- Stream to ElasticSearch cluster for further analytics

Log storage architecture:

Log Groups: Arbitary name, usually representing an application
Log Stream: instances within application / log files / containers (A log stream is a sequence of log events that share the same source)
Can define log expiration policies (never expire, 30 days, etc)
Using the CLI we can tail CloudWatch logs
To send logs to CloudWatch, make sure IAM permissions are correct!
Security: Encryption of logs using KMS at the Group level

CloudWatch Logs Metric Filter & Insights

CloudWatch Logs can use filter expressions
- For example, find a specific IP inside a log
- Metric filters can be used to trigger alarms (found specific IP, then alarm)
  - Create your own metrics based on these filters, and then alarms
CloudWatch Logs Insights can be used to query logs, and add queries to CloudWatch Dashboards (comes with some default)

CloudWatch Alarms

Alarms are used to trigger notifications for any metric
Alarms can go to Auto Scaling, EC2 Actions, SNS Notifications
Various options (sampling, %, max, min, etc)
Alarm States:
- OK
- INSUFFICIENT_DATA
- ALARM
Period:
- Length of time in seconds to evalute the metric
- High resolution custom metrics: can only choose 10 sec or 30 sec
Alarm Targets (exam)
- Stop, Terminate, Reboot, or Recover an EC2 instance
- Trigger autoscaling action
- Send notificatin to SNS (from which you can do almost anything)
Good to know
- Alarms can be created based on CloudWatch Logs Metrics Filters
- CloudWatch doesn't test or validate the actions that are assigned
- To test alarms and notifications, set the alarm state to Alarm using CLI
  - aws cloudwatch set-alarm-state --alarm-name "myalarm" --state-value ALARM --state-reason "testing purposes"

CloudWatch Events

Source + Rule -> Target
Schedule: Like a cron job (same format)
Event Pattern: Event rules to react to a service doing something (Ex: CodePipeline state changes)
Triggers to Lambda functions, SQS/SNS/Kinesis Messages
CloudWatch Event creates a small JSON document to give info on the change

CloudTrail

Provides governance, compliance, and audit for your account
Enabled by default
Get a history of events / API calls made within your account by:
- Console
- SDK
- CLI
- AWS Services
Can put logs from CloudTrail into CloudWatch logs
If a resource is deleted, look into CloudTrail first
Shows past 90 days of activity (store elsewhere after, CloudWatch in Trail config etc)
The default UI only shows Create, Modify, or Delete events
CloudTrail Trail
- Get a detailed list of all the events you choose
- Ability to store these events in S3 for further analysis
- Can be region specfic or global
CloudTrail Logs have SSE-S3 encryption by default when placed in S3
Control access to S3 using IAM, Bucket Policy, etc

AWS Config

Helps with auditing and compliance of your AWS resources
Helps record configurations and changes over time
Helps record compliance over time
Possibility of storing AWS Config data into S3 (to be queried by Athena)
Questions that can be solved by AWS Config
- Is there unrestricted SSH access to my security groups
- Do my buckets have any public access
- How has my ALB confgiuration changed over time
You can receive alerts (SNS notifications) for any changes
AWS Config is a per-regios service
Can be aggregated across regions and accounts

Config Rules

Can use AWS managed config rules (over 75)
Can make custom config rules (must be defined in AWS Lambda)
- Evaluate if each EBS disk is of type gp2
- Evaluate if each EC2 instance is t2.micro
Rules can be evaluated by triggers
- For each config change
- And / or at regular intervals
Pricing - No Free Tier, $2USD per active rule per month (decreases after 10 rules)

AWS Config Resource
- View compliance of a resource over time
- View configuration of a resource over time
- View CloudTrail API calls if enabled

CloudWatch vs CloudTrail vs Config

CloudWatch
- Performance Monitoring (metrics, CPU, network, etc) & dashboards
- Events & Alerting
- Log Aggregation & Analysis
CloudTrail
- Record API calls madewithin your Account by everyone
- Can define trails for specific resources
- Global Service
Config
- Record confgiruation changes
- Evaluate against compliance rules
- Get timeline of changes and compliance

AWS Account Management

AWS Status - Service Health Dashboard

Shows all regions, all services health
Shows historical information for each day
Has an RSS feed you can subscribe to
https://status.aws.amazon.com

AWS Personal Health Dashboard

Global Service
Show how AWS outages directly impact you
Shows impact on your resources
List issues and actiosn you can do to remediate them
https://phd.aws.amazon.com

AWS Organizations

Global Service
Allows to manage multiple AWS accounts
The main account is the master account - can't change it
Other accounts are member accounts
Member accounts can only be part of one organization
Consolidated Billing across all accounts - single payment method
Pricing benefits from aggregated usage (volume discount)
API is available to automate AWS account creation

OU & Service Control Policies (SCPs)

Organize accounts in Organizational Units (OU)
- Can be anything: dev/test/prod or Finance/HR/IT
- Can nest OU within OU
Apply SCP to OU
- Permit / Deny access to AWS services
- SCP has a similar syntax to IAM
- It's a filter to IAM
- Policies seem to be inherited
Helpful to create sandbox accounts
Helpful to separate dev and prod resources
Helpful to only allow approved services

AWS Service Catalog

For users that are new to AWS and have too many options, may create stacks that are not compliant / in line with the rest of the organization
Some users just want a quick self-service portal to launch a set of authorized products pre-defined by admins
Such as: virtual machines, databases, storage options, etc...
Admins create CloudFormation templates -> products, collection of Products is a Portfolio, user gets product list

AWS Cost Explorer

A graphical tool to view and analyze your costs and usage, trends
Review charges and cost associated with your AWS account or org
Forecast spending for next 3 months
Get recommendations/insight for which EC2 Reserved Instances to purchase
- View Reservation Summary, and net savings from them (EC2, RDS, etc)
Access to default reports
API to build custom cost management applications

AWS Budgets

Create Budget and send alarms when costs exceeds the budget
3 types of budgets: Usage, Cost, Reservation
For Reserved Instances (RI)
- Track utilization
- Supports EC2, ElastiCache, RDS, Redshift
Up to 5 SNS notifications per budget
Can filter by: Service, Linked Account, Tag, Purchase Option, Instance Type, Region, AZ, API Operation, etc
Same options aas AWS Cost Explorer
2 Budgets are free, then $0.02/day per

AWS Billing Alarms

Different than Budget Alerts, almost same as Cost budget
Billing data metrics are stored in CloudWatch us-east-1
Billing data are for overall worldwide AWS costs
It's for actual costs, not project costs

AWS Cost Allocation Tags

With Tags we can track resources that relate to each other
With Cost Allocation Tags we can enable detailed costing reports
Just like Tags, but they show up as columns in reports
AWS Generated Cost Allocation Tags
- Automatically applied to the resource you create
- Starts with Prefix **aws: (eg aws:createdBy)
- They're not applied to resources created before the activation
User tags
- Defined by the user
- Starts with Prefix user:
Cost Allocation Tags automatically appear in the Billing Console
Takes up to 24h for the tags to show up in report

Security and Compliance

DDoS Protection on AWS

AWS Shield Standard: protects against DDoS attacks for your website and applciations, no additional cost
AWS Shield Advanced: 24/7 premium DDoS protection
AWS WAF: Filter specific requests based on rules
Cloudfront and Route 53
- Availability protection using global edge network
- Combined with AWS Shield, provide attack mitigation at edge
Be ready to Scale - leverage AutoScaling
Separate static resources (S3 / CloudFront) from dynamic ones (EC2/ALB)

AWS Shield

AWS Shield Standard
- Free Service protects against attacks such as SYN/UDP floods, Reflection attacks, and other layer 3/4
AWS Shield Advanced
- Optional DDoS mitigation service ($3000 per month)
- Protects against more sophisticated attacks on CloudFront, Route 53, Classic, Application & Network Load Balancers, EIP, EC2
- 24/7 access to AWS DDoS response team (DRP)
- If you do get higher fees due to scaling, fees are covered

WAF

Protects application from common web exploits
Define customizable web security rules:
- Control which traffic to allow or block to your web applications
- Rules can include: IP addresses, HTTP headers, HTTP body, or URI strings
- PRotects against common attacks - SQL injection, Cross site scripting
- Protects against bots, bad user agents, etc
- Size constraints
- Geo match
Deploy on CloudFront, Application Load Balancer, or API GW
Leverage existing marketplace of rules

Penetration Testing on AWS

Permission is required (not any more though)
Request permissions with AWS root credentials
No 3rd party testing
For EC2, ELB, RDS, Aurora, CloudFront, API GW, Lambda, Lightsail
Cannot test against nano / micro / small instances
Takes 2 business days to be approved

AWS Inspector

Only for EC2 instances
Analyze against known vulnerabilities
- Common Vulnerabilities and Exposures (CVE)
- Center for Internet Security (CIS) Benchmarks
- Security Best Practices
- Runtime behaviour Analysis
Analyze against unintended network accessibility
- Network reachability
AWS Inspector Agent must be isntalled on OS in EC2 instances
Define template (rules package, duration, attributes, SNS topics)
No custom rules possible, only AWS managed
Afterwards you get a report with a list of vulnerabilities
Use SSM instead of manual install
Does has CPU impact

Logging in AWS

CLoudTrail Trails - Trace all API calls
Config Rules - For config & compliance over time
CloudWatch Logs - For full data retention
VPC Flow Logs - IP traffic within your VPC
ELB Access Logs - Metadata of requests made to your load balancers
CloudFront Acess Logs - Web Distribution access logs
WAF Logs - Full logging of all requests analyzed by WAF
Logs can be analyzed using Athena if they're stored in S3
Should encrypt logs in S3, controll access using IAM & Bucket Policies, MFA

GuardDuty

Intelligent threat discovery to protect AWS Account
Uses Machine Learning algorithms, anomaly detection, 3rd party data
One click to enable (30 day trial), no need to install software
Input data includes
- CloudTrail Logs: Unusual API calls, unauthorized deployments
- VPC Flow Logs: Unusual internal traffic, unusual IP addresses
- DNS Logs: Compromised EC2 instances sending encoded data within DNS queries
Notifies you in case of findings
Integration with AWS Lambda

Trusted Advisor

No need to install anything - high level AWS Account assessment
Analyzes your AWS accounts and provides recommendations:
- Cost optimization
- Performance
- Security
- Fault Tolerance
- Service Limits (ie getting close to etc)
Core Checks and Recommendations - all customers
Can enable weekly email notifications from the console
Full Trusted Advisor - Available for Business & Enterprise support plans
- Ability to set CloudWatch alarms when reaching limits

KMS Overview + Encryption In Place

Any time you need to share sensitive information use KMS
- DB passwords
- Credentials to external service
- Private Key of SSL certificates
The value in KMS is that the CMK used to encrypt data can never be retireved by the user, and the CMK can be rotated for extra security
**Never store secrets in plaintext, especially in code!
Encrypted secrets can be stored in the code / envronment variables
KMS can only help in encrypting up to 4KB of data per call
If data > 4KB, use envelope encryption
To give access to KMS to someone:
- Make sure the Key Policy allows the user
- Make sure the IAM Policy allows the API call
Able to fully manage the keys and policies:
- Create, Diable, Enable, Rotation policies
Able to audit eky usage (using CloudTrail)
Three types of Customer Master Keys:
- AWS Managed Service Default CMK: free
- User Keys created in KMS: $1 / Month
- User Keys imported (must be symmetric 256-but key): $1 / Month
- pay for API calls to KMS ($0.03 / 10000 calls)

Encryption in AWS Services

Requires migration (through Snapshot/backup)
- EBS Volumes
- RDS databases
- ElastiCache
- EFS network file system
In-place encryption
- S3

Cloud HSM Overview

KMS -> AWS manages the software for encryption
CloudHSM -> AWS provisions encryption hardware
Dedicated Hardware (HSM = Hardware Security Module)
You entirely manage your own encryption keys (not AWS)
The CloudHSM hardware device is tamper resistant
FIPS 140-2 Level 3 Compliance
CloudHSM clusters are spread across multi AZ
Supports both symmetric and asymmetric encryption (ie SSL/TLS keys), KMS does only symmetric
No free tier
Has Cryptographic Acceleration (SSL/TLS, Oracle TDE)
Must use the CloudHSM Client Software, no API

MFA + IAM Credentials Report

AWS MFA accepts both virtual and hardware MFA devices
MFA for root user can be configured from the IAM dashboard
MFA can also be configured from the CLI
Can set up MFA for individual users
Credentials Report
- A CSV report file on all the IAM users and credentials
- This shows who all have enabled MFA

IAM PassRole Action (exam)

In order to assign a role to an EC2 instance you need IAM:PassRole
Can be used for any service where we assign roles, not just EC2

Security Token Service (STS) & Cross Account Access

Allows to grant limited and temproary access to AWS resources
Token is valid for up to one hour (must be refreshed)
Cross Account Access
- Allows users from one AWS account to access resources in another
  - Define an IAM Role for another account to access
  - Define which accounts can access this IAM Role
  - Use AWS STS to retrieve credentials and impersonate the IAM Role you have access to (AssumeRole API)
  - Temporary credentials can be valid between 15 minutes to 1 hour

Federation (AD)
- Provides a non-AWS user with temporary AWS access by linking users Active Directory credentials
- Uses SAML
- Allows Single Sign On (SSO) which enables users to log in to AWS console without assigning IAM credentials
Federation with third party providers / Cognito
- Used mainly in web and mobile applications
- Makes use of FB/G/Amazon etc to federate them

Identity Federation with SAML & Cognito

Federation lets users outside of AWS to assume a temporary role for accessing AWS resources
These users assume an identity provided access role
Federation assumes a form of 3rd party authentication
- LDAP, MS AD (~SAML), SSO, OpenID, Cognito
Using federation you don't need to create IAM users, user mgmt is outside AWS

SAML Federation for Enterprises

To integrate AD / ADFS with AWS (or any SAML 2.0)
Provides access to AWS Console or CLI (through temp creds)
No need to create an IAM user for each of your employees

Custom Identity Broker Application for Enterprises

Use only if identity provider is not compatible with SAML 2.0
Must write own broker
The identity broker must determine the appropriate IAM Policy

AWS Cognito - Federated Identity Pools For Public Applications

Goal: Provide direct access to AWS resources from the client side
How:
- Log in to dereated identity provider - or remain anonymous
- Get temporary AWS credentials back from the Federated Identity Pool
- The credentials come with a pre-defined IAM policy stating their permissions

AWS Artifact

Portal the privdes customer with on-demand access to AWS compliance documentation and AWS agreements
Can be used to support internal audit or compliance

Security and Compliance Section Summary

AWS Shield: Automatic DDoS Protection + 24/7 support for advanced
AWS WAF: Firewall to filter incoming requests based on rules
AWS Insepctor: For EC2 only, install agents and find vulernabilities
AWS GuardDuty: Find malicious behaviour with VPC, DNS, and CloudTrail Logs
AWS Trusted Advisor: Analyze AWS account and get recommendations
AWS KMS: Encryption keys managed by AWS
AWS CloudHSM: Hardware encryption, we manage keys, supports asymmetrical
AWS STS: Generate security token
Identity Federation: SAML 2.0 or Custom for Enterprise, Cognito for Apps
AWS Artifact: Get access tocompliance reports such as PCI, ISO, etc
AWS Config: Track config changes and compliance against rules (over time)
AWS CloudTrail: Track API calls made by users within an account

Route 53

Most common records
- A: URL to IPv4
- AAAA: URL to IPv6
- CNAME: URL to URL (non root domain)
- Alias: URL to AWS resource (root and non-root), free of charge, supports native health checks
Can use
- Public domain names
- Private domain names that can only be resolved by your VPC instances
$0.50 per hosted zone

Has
- Load Balancing (through DNS, client LB)
- Health checks (limited)
- Routing policy: simple, failover, geolocation, latency, weighted, multi value
Simple Routing Policy
- Maps a domain to one URL
- Use when directing to a single resource
- Cannot attach health checks
- If multiple values are returned, a random one is chosen by client

Weighted Routing Policy
- Control % of requests that go to specific endpoint (ex: 70, 20, 10. Sum does not have to be 100)
- Create multiple record sets with weighted option
- Helpful to test 1% of traffic on new app
- Split traffic between regions
- Can be associated with health checks

Latency Routing Policy
- Redirect to server that has the least latency, close to request
- Evaluated in terms of user to designated AWS region
- Must specify region in latency record
- Germany could be directed to US if lower latency

Route 53 Geolocation Policy

Different from latency based
Based on user location
Traffic from England should go to X
Must have a default policy if no other match exists

Multi Value Routing Policy

Use when routing traffic to multiple instances
When want to associate a Route 53 health check with records, removes unhealthy from returned values
Up to 8 healthy records are returned for each MultiValue query (even if you have 50)
MultiValue is not a substitute for using ELB

Route 53 Health Checks

Will not send traffic to if failed
Deemed unhealthy if checks fail 3 times
Deemed healthy if checks pass 3 times
Default interval 30 secs (can set fast health check at 10s, higher cost)
About 15 health checkers will launch to check endpoint health
- one request every 2 secs on average
Can have HTTP, TCP, and HTTPS check (no SSL certificate verification)
Possibility of integrating health checks with CloudWatch
Health checks can be linked to Route 53 DNS record set

Route 53 as a Registrar

Offer both Registrar and DNS service

VPC

CIDR

Two components
- Base IP (xx.xx.xx.xx)
- Subnet mask (/32) (defines how many bits can change in an IP)
  - Can take two forms
    /24
    255.255.255.0 (less common)
  - /32 = 1 IP = 2^0
  - /31 = 2 IP = 2^1
  - /30 = 4 IP = 2^2
  - /29 = 8 IP = 2^3
  - /24 = 256 IP = 2^8
  - etc
  - /16 = 65536 = 2^16
  - /0 = all = 2^32
  - /32 - No IP number can change
  - /24 - Last .xIP number can change
  - /16 - Last x.xIP number can change
  - /8 - Last x.x.xIP number can change
  - /0 - All x.x.x.xIP numbers can change

Public vs Private

IANA via RFC 1918
Private IP can have the following values
- 10.0.0.0 - 10.255.255.255 (10.0.0.0/8)
- 172.16.0.0 - 172.31.255.255 (172.16.0.0/12) AWS default
- 192.168.0.0 - 192.168.255.255 (192.168.0.0/16)

VPC in AWS - IPv4

Can have multiple VPCs per region (5 soft limit)
Max CIDR per VPC is, following:
- Min size /28 = 16 IP
- Max size /16 = 65535 IP
Because VPC is private, only RFC1918 addresses
VPC CIDR should not overlap with your other networks

Subnets

AWS reserves 5 IPs (first 4 and last 1 of range) in each Subnet
They are not available for use
For CIDR 10.0.0.0/24:
- 10.0.0.0: Network address
- 10.0.0.1: Reserved by AWS for the VPC router
- 10.0.0.2: Reserved by AWS for mapping to Amazon provided DNS
- 10.0.0.3: Reserved for future use
- 10.0.0.255: Network broadcast (assume not available for exam)
Exam Tip: If you need 29 IP addresses for EC2 you can't choose a /27 because it's only 32 IPs, need a /26 (64IP)

Internet Gateway

Helps VPC internet connection
Scales horizontally, HA, and redundant
Must be created separately from VPC
One VPC per IGW, one IGW per VPC
IGW is also a NAT for the instances that have a public IPv4
Will not have internet access without Route Tables

NAT Instances (outdated)

Allow instances in the private subnet to connect to the internet
Must be launched in a public subnet
Must disable EC2: Source / Destination Check
Must have an Elastic IP (because route tables require fixed)
Route table must be configured to route trafcic from private subnets to NAT instance

Pre-configured Amazon Linux AMI are available
Not highly available or resilient setup by default
Would need to create an ASG in Multi AZ + resilient user-data script
Internet traffic bandwidth depends on EC2 instance performance
Must manage security ggroups & rules
- Inbound
  - Allow HTTP/S from private subnets
  - Allow SSH from hom network (through IGW)
- Outbound
  - Allow HTTP/S traffic to internet
  - Allow ICMP traffic to internet

NAT Gateway (new)

Only IPv4
AWS managed NAT, higher bandwidth, better availability, no admin
Pay by the hour for usage and bandwidth
NAT is created in a specfic AZ, uses EIP (can be in used Public Subnet)
Cannot be used by an instance in that subnet (only from other subnets)
Requires and IGW (Private subnet -> NAT -> IGW)
5 Gbps of bandwidth with auro-scaling up to 45Gbps
No security groups required

* Differences between the two

DNS Resolution in VPC

enableDnsSupport: (=Edit DNS Resolution Setting)
- Default True
- Decides if DNS resolution is supported for the VPC
- IfTrue, queries the AWS DNS server at 169.254.169.253
enableDnsHostname: (=Edit DNS Hostname setting)
- False by default for newly created VPC, True by default for Default VPC
- Won't do anything unless enableDnsSupport=True
- IfTrue, assign a public hostname to EC2 instances if it has a public IP
If you must use custom DNS domain names in a private zone in Route 53, you must have both as TRUE

NACL & Security Groups

NACL are like a firewall controlling traffic to and from subnet
Default NACL allows everything inbound and outbound
One NACL per Subnet, new Subnets are assigned the Default NACL
Define NACL rules:
- Rules have a number (1 - 32776) and LOWER number have precedence (once a number is matched it wins and ignores after)
- Last rule is an asterisk (*), and denies all in case of no match
- AWS recommends adding rules by increment of 100
Newly created NACL will deny everything
NACL are great way of blocking a specfic IP at the subnet level
Can be associated to multiple subnets
Rmemeber ephemeral ports

Inbound

SG is Stateful on outbound, will allow out an incoming request return even if outbound rules say not to (SG evaluates all rules before deciding)
NACL is Stateless on outbound, all rules are evaluated

Outbound

SG is Stateful on inbound, will allow in a returning request even if inbound rules say not to
NACL is Stateless on inbound, all rules are evaluated

VPC Endpoints

Endpoints allow you to connect to AWS services using a private network instead of the public internet
They scale horizontally and are redundant
They remove the need for IGQ, NAT, etc, to access AWS services
Interface: provisions and ENI (private IP) as an entry point (select subnets)(must attach security group) - for most AWS services
Gateway: provisions a target and must be used in a route table which is associated with subnets S3 and DynamoDB
- Needs region specified on the CLI because CLI default is us-east-1 with unspecified
In case of issues:
- Check DNS setting resolution in your VPC
- Check Route Tables

VPC Peering

Connect two VPC privately using AWS' network
Make them behave as if they were in the same network
Must not have overlapping CIDR
VPC Peering connection is not transitive (must be established for each VPC that needs to communicate with another)
Can do between accounts and regions
You must update route tables in each VPC's subnets to ensure instances can communicate

Flow Logs

Capture information about IP traffic going to your interfaces:
- VPC Flow Logs
- Subnet Flow Logs
- Elastic Network Interface (ENI) Flow Logs
For ACCEPT and REJECT traffic
Helps to monitor & troubleshoot connectivity issues
Flow logs data can go into S3 (Athena) / CloudWatch Logs (Insights)
Captures network information from AWS managed interfaces too: ELB, RDS, ElastiCache, Redshift, WorkSpaces

Flow Log Syntax

[version, accountid, interfaceid, srcaddr, dstaddr, srcport, dstport, protocol, packets, bytes, start, end, action, logstatus]
2 123456789010 eni-1235b8ca123456789 172.31.16.139 172.31.16.21 20641 22 6 20 4249 1418530010 1418530070 ACCEPT OK
Query VPC flow logs using Athena on S3 or CloudWatch Logs Insights

Bastion Hosts

Used to SSH into private instances
In the public subnet which is then connected to all private subnets
Bastion Host security must be tight
Exam tip: Make sure the bastion host only has port 22 from your ip, not even SG of your other instances

Site to Site VPN, Virtual Private Gateway, Customer Gateway

Virtual Private Gateway
- VPN concentrator on the AWS side of the VPN connection
- VGW is created and attached to the VPC from which you want to create the site-to-site VPN
- Possbility to customize the ASN
Customer Gateway
- Software application or physical device on customer side of the VPN connection
- IP Address
  - Use the static, internet routeable, IP address of your customer gateway device
  - If the CGW is behind a NAT (with NAT-T), use the public address of the NAT

Direct Connect

Provides a dedicated private connection from a remote network to your VPC
Dedicated connection must be setup between your DC and AWS Direct Connect locations
You need to set up a Virtual Private Gateway on your VPC
Access public resources (S#) and private (EC2) on the same connection
Use cases:
- Increase bandwidth throughput - working with large data sets - lower cost
- More consistent network experience - application using real-time data feeds
- Hybrid Environments
Supports both IPv4 and IPv6

Direct Connect Gateway

If you want to set up a Direct Connect to one or more VPC in many different regions (no overlapping IPs)

Egress only IGW

Egress only IGW is for IPv6 only
Similar function as a NAT (GW), but a NAT is for IPv4
All IPv6 are public addresses
Therefore all instances are publicly accessible
Egress Only Internet Gatway gives out IPv6 instances access to the internet, but not reachable publicly
After creating an Egress Only IGW edit the Route Tables

VPC Summary

PreviousNotes NextEverything

Last updated 3 years ago

hashtagEC2 For Sysops

hashtagEC2 Placement Groups

hashtagEC2 Shutdown Behavior & Termination Protection

hashtagTroubleshooting EC2 Launch Issues

hashtagTroubleshooting EC2 SSH Issues

hashtagEC2 Instance Launch Modes

hashtagEC2 Instance Types Deep Dive

hashtagCross Account AMI Copy

hashtagElastic IPs

hashtagCloudWatch

hashtagCloudwatch metrics for EC2 (exam)

hashtagCustom Metrics

hashtagCloudwatch Logs for EC2 instances

hashtagEC2 at Scale

hashtagAWS Tags

hashtagSystem Manager Resource Groups

hashtagSSM Documents

hashtagSSM Run Command

hashtagUsing SSM to PATCH

hashtagSSM Session Manager

hashtagLost SSH Key

hashtagParameter Store

hashtagLoad Balancing

hashtagLoad Balancers Monitoring

hashtagLoad Balancers Access Logs

hashtagApplication LB Request Tracing

hashtagAutoScaling and Group

hashtagCloudWatch for ASG

hashtagElastasic BeanStalk

hashtagCloudFormation

hashtagCloudformation for SysOps

hashtagEFS & EBS

hashtagEBS Volume

hashtagEBS Volumes Types

hashtagGP2 volumes I/O Burst

hashtagComputing MB/s based on IOPS

hashtagEBS Volume Resizing

hashtagEBS Snapshots

hashtagEBS Migration

hashtagEBS Encryption

hashtagEBS RAID

hashtagEBS for SysOps

hashtagEBS Troubleshooting

hashtagCloudWatch and EBS

hashtagEFS

hashtagInstance store

hashtagS3

hashtagS3 MFA Delete

hashtagS3 Access Logs

hashtagS3 Cross Region Replication

hashtagS3 Pre-signed URLs

hashtagS3 Inventory

hashtagS3 Storage Tiers

hashtagGlacier Operations

hashtagGlacier Vault Policies & Vault Lock

hashtagS3 Lifecycle Rules

hashtagS3 Analytics - Storage Class Analysis (in s3 mgmt)

hashtagGlacier

hashtagSnowball

hashtagAWS Snowmobile

hashtagStorage Gateway

hashtagS3 For SysOps

hashtagCloudFront

hashtagCloudFormation

hashtagCloudFront

hashtagCloudFront Access Logs

hashtagCloudFront Signed URL / Signed Cookies

hashtagCloudFront vs S3 Cross Region Replication

hashtagCloudFront Geo Restriction

hashtagAthena

hashtagDatabases

hashtagRDS

hashtagRDS Read Replicas for scalability

hashtagRDS Multi AZ (Disaster Recovery)

hashtagRDS Multi AZ vs Read Replicas

hashtagDB Parameter Groups

hashtagRDS Backups

hashtagBackup vs Snapshots

hashtagRDS Encryption

hashtagRDS Security

EC2 For Sysops

EC2 Placement Groups

EC2 Shutdown Behavior & Termination Protection

Troubleshooting EC2 Launch Issues

Troubleshooting EC2 SSH Issues

EC2 Instance Launch Modes

EC2 Instance Types Deep Dive

Cross Account AMI Copy

Elastic IPs

CloudWatch

Cloudwatch metrics for EC2 (exam)

Custom Metrics

Cloudwatch Logs for EC2 instances

EC2 at Scale

AWS Tags

System Manager Resource Groups

SSM Documents

SSM Run Command

Using SSM to PATCH

SSM Session Manager

Lost SSH Key

Parameter Store

Load Balancing

Load Balancers Monitoring

Load Balancers Access Logs

Application LB Request Tracing

AutoScaling and Group

CloudWatch for ASG

Elastasic BeanStalk

CloudFormation

Cloudformation for SysOps

EFS & EBS

EBS Volume

EBS Volumes Types

GP2 volumes I/O Burst

Computing MB/s based on IOPS

EBS Volume Resizing

EBS Snapshots

EBS Migration

EBS Encryption

EBS RAID

EBS for SysOps

EBS Troubleshooting

CloudWatch and EBS

EFS

Instance store

S3

S3 MFA Delete

S3 Access Logs

S3 Cross Region Replication

S3 Pre-signed URLs

S3 Inventory

S3 Storage Tiers

Glacier Operations

Glacier Vault Policies & Vault Lock

S3 Lifecycle Rules

S3 Analytics - Storage Class Analysis (in s3 mgmt)

Glacier

Snowball

AWS Snowmobile

Storage Gateway

S3 For SysOps

CloudFront

CloudFormation

CloudFront

CloudFront Access Logs

CloudFront Signed URL / Signed Cookies

CloudFront vs S3 Cross Region Replication

CloudFront Geo Restriction

Athena

Databases

RDS

RDS Read Replicas for scalability

RDS Multi AZ (Disaster Recovery)

RDS Multi AZ vs Read Replicas

DB Parameter Groups

RDS Backups

Backup vs Snapshots

RDS Encryption

RDS Security