Developer Associate - Study Notes

Taken as if SA and SysOps are completed


Developer Associate

S3 Performance

Historically

  • When you had > 100 TPS S3 perf could degrade

  • Behind the scenes each object goes to an S3 partition and for best perf you want high partition distribution

  • In the exam, and in life historically, it was recommended to have random characters in front of your key name to optimise perf (partition distribution)

    • <my_bucket>/5r4d_my_folder/my_file1.txt

    • <my_bucket>/a91e_my_folder/my_File2.txt

  • It was recommended never to use dates to prefix keys

Current State

  • As of July 17 2018 it scales up to 3500 TPS for PUT and 5500 TPS for GET for EACH PREFIX

  • Negates previous guidance to randomize object prefixes to achieve faster perf

Performance

  • Faster upload of large objects (>=100MB), use multipart upload:

    • parallelizes PUTs for greater throughput

    • maximize your network bandwidth and efficiency

    • decrease time to retry in csase a part fails

    • must use multi-part upload if object size is greater than 5GB

  • Use CloudFront to cache S3 objects around the world (improves reads)

  • S3 Transfer Acceleration (use edge locations, improves writes) - just need to change the endpoint you write to, not the code

  • If using SSE-KMS encryption you may be limited to your AWS limits for KMS usage (~100s to 1000s doanloads/uploads per second, request limit increase)

S3 Select & Glacier Select

  • If you retrieve data in S3 and Glacier you may only want a subset of it

  • If you retrieve all the data the network costs may be high

  • With S3 Select / Glacier Select you can use SQSL SELECT queries to let S3 or Glacier know exactly which attributes / filters (columns / rows)

    • select * from s3objects where s."Country (Name)" like '%United States%'

  • Save up to 80% and increase perf by up to 400%

  • the "SELECT" happens within S3 or Glacier

  • Works with files in CSV, JSON, or Parquet

  • Files can be compressed with GZIP or BZIP2

  • No subqueries or Joins are supported

CLI

  • Dry Runs: Some AWS commands (not all) contain a --dry-run option to simulate API calls

  • Multiple Profiles

    • aws configure --profile xxx

    • aws s3 ls --profile=xxx


Developing on AWS

AWS CLI STS Decode Errors

  • When you run API calls and they fail you need to decode them using the STS command line"

  • aws sts decode-authorization-message

EC2 Instance Metadata

  • Allows EC2 Instances to learn about themselves without using an IAM Role for that purpose

  • http://169.254.169.254/meta-data

  • Can retrieve the IAM Role name from the metadata but CANNOT retireve the IAM policy

AWS SDK

  • Perform action on AWS directly from your application's code using an SDK (the CLI is a wrrapper around boto3)

    • Java, .NET, Node.js, PHP, Python (boto3/botocore), Go, Ruby, C++)

  • Recommended to use the default credential provider chain

    • Which works seamlessly with:

      • AWS credentials at ~/.aws/credentials (only on our computers on premise)

      • Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY), not really recommended

  • Best practice is for credentials to be inherited from mechanisms above, and %100 IAM roles if working from within AWS Services

Exponential Backoff

  • Any API that fails because of too many calls needs to be retired with Exponential Backoff

  • These apply to rate limited API

  • Retry mechanism included in SDK API calls

  • 2ms, 4ms, 8ms, 16ms, 32ms, 64ms, etc


AWS Elastic Beanstalk

  • Uses Cloudformation under the hood

  • Managed service

    • Instance configuration / OS is handled by beanstalk

    • Deployment strategy is configurable but performed by ElasticBeanStalk

  • Just the applciation code is the responsbility of the developer

  • Three architecture models:

    • Single Isntance deployment: good for dev

    • LB + ASG: Gret for prodution or pre-production we applications

    • ASG only: great for non-web applications in production (workers, etc)

  • ElasticBeanstalk has three components

    • Application

    • Application Version: Each deployment gets assigned a new version

    • Environment name (dev, test, prod): free naming

  • You deploy application versions to environments and can promote application versions to the next environment

  • Rollback feature to previsou application

  • Full control over lifecycle of environments

Deployment Options for Updates

  • All at once (deploy all at one go): Fastest but instances aren't available to serve traffic for the downtime

  • Rolling: Update a few instances at a time (bucket), and then move on to the next bucket once the first bucket is healthy

  • Rolling with additional batches: Like rolling but spins up new instances to move the batch (so always at max capacity)

  • Immutable: Spins up new instances in a new ASG, deploys version to it, and then moves them into old ASG when everything is healthy and terminates old instances. Highest cost, quick rollback (just terminate ASG, good for prod).

Blue/Green Deployment

  • Create a new "stage" env, deploy v2 there

  • New env (green) can be fully validated and roll back if issues

  • Route 53 can be set up using weighted policies to redirect traffic bit by bit to the new env

  • Using Beanstalk use "swap URLs" when done with env test

a7b6ceffe6816bf8c33c63fc6c5be3ee.png

Elastic Beanstalk Extensions

  • A zip file containing our code msut be deployed to Elastic Beanstalk

  • All the paramters set in the UI can be configured with code using files

  • Requirements:

    • in the .ebextentions/ directory in the root of source code

    • YAML / JSON format

    • .config extensions (ex: logging.config)

    • Able to modify some default settings using: option_settings

    • Ability to add resources such as RDS, ElastiCache, DynamoDB, etc

  • Resources managed by .ebextensions get deleted if the environment goes away

Elastic Beanstalk CLI

  • We can install an additional CLI called the "EB CLI" which makes working with Beanstalk from the CLI easier

  • Basic Commands are:

    • eb create, eb status, eb health, eb events, eb logs, eb open, eb deploy, eb config, eb terminate

  • Helpful for you automated deployment pipelines

Elastic Beanstalk Deployment Mechanism

  • Describe dependencies

    • (requirements.txt for Python, package.json for Node.js)

  • Package code as zip

  • Zip file is uplaoded to each EC2 machine

  • Each EC2 machine resolves dependencies (SLOW)

  • Optimization in case of long deployments: Package dependencies with source code to improve deployment preformance and speed

Exam Tips

  • Beanstalk with HTTPS

    • Load the SSL certificate onto the LB

    • Can be done from the console (EB console - LB config)

    • Can be done via code: .ebextensions/securelistener-alb.config

      • elb.config??

    • SSL Certificate can be provisioned using ACM or CLI

    • Must configure a security group rule to allow incoming port 443

  • Beanstalk redirect HTTP to HTTPS

    • Configure your instance to redirect HTTP to HTTPS

    • OR configure the ALB (only) with a rule

    • Make sure health checks are not redirected (so they giving 200 OK)

Beanstalk Lifecycle Policy

  • Elastic Beanstalk can store at most 1000 application versions

  • If you don't remove old versions you won't be able to deploy any more

  • To phase out old application versions use a lifecycle policy

    • Based on time (old versions are removed)

    • Based on space (when you have too many versions)

  • Version that are currently used won't be delted

  • Option not to delete the source bundle in S3 to prevent data loss

Web Server vs Worker Environment

  • If your application performs tasks that are long to complete, offload these tasks to a dedicated worker environment

  • Decooupling your application into two tiers is common

  • Example: processing a video, generating a zip file, etc.

  • You can define periodic tasks in a file - cron.yaml

11af2fd8f5691e397bfd20ff31504f4f.png

RDS with Elastic Beanstalk

  • RDS can be provisioned with Beanstalk, which is great for dev/test

    • Not so great for Prod, as the database lifecycle is tied to the Beanstalk environment lifecycle

    • The best for Prod is to separately create an RDS database and provide our EB application with the connection string

  • Steps to migrate from RDS coupled in EB to standalone RDS:

    • Take an RDS DB snapshot (backup)

    • Enable deletion protection in RDS

    • Create a new Beanstalk env without an RDS, point to existing old RDS

    • Perform a blue/green deployment and swap old and new environments

    • Terminate the old environment (RDS will not be deleted due to protection)

    • Delete the CloudFormation stack (will be in DELETE_FAILED due to not deleting RDS, that's ok)


CI/CD

872ab54970f6c199022eda3f2570a67a.png

CodeCommit

  • Version control is the ability to understand the various change that happen to code over time (and possibly roll back)

  • All these are neable by using a version control system such as Git (can be local but usually central online)

  • Benefits are:

    • Collaborate with other devs

    • Make sure codei backed up somehwere

    • Make sure it's fully viewable and auditable

  • AWS Codecommit:

    • Private Git repositories

    • No size limit on repositories (and scales seamlessly)

    • Fully managed, highly available

    • Code remains in AWS Cloud account -> increased security and compliance

    • Secure (encrypted, access control, etc)

    • Integrates with Jenkins, CodeBuild, other CI tools

CodeCommit Security

  • Interactions are done using Git (standard)

  • Authentication in Git

    • SSH Keys: AWS Users can configure SSH keys in their IAM Console

    • HTTPS: Done through the AWS CLI Authentication helper or Generating HTTPS credentials

    • MFA can be enabled

  • Authorization in Git

    • IAM Policies manage user / roles rigths to repositories

  • Encryption

    • Repositories are automatically encrypted at rest using KMS

    • Encrypted in transit (can only use HTTPS or SSH)

  • Cross account access

    • Do not share SSH keys

    • Do not share AWS credentials

    • Use IAM Role in your AWS Account and AWS STS in recipient account (with AssumeRole API)

CodeCommit Notifications

  • You can trigger notifications in CodeCommit using AWS SNS or AWS Lambda or AWS CloudWatch Event Rules

  • Use cases for notifications SNS / AWS Lambda notifications (modifications to code use case)

    • Deletion of branches

    • Trigger for pushes that happen in master branch

    • Notify external Build System

    • Trigger AWS Lambda function to perform codebase analysis (maybe credentials got committed in code, etc)

  • Use cases for CloudWatch Event Rules (more around pull requests)

    • Trigger for pull request updates (created, updated, deleted, commented)

    • Commit comment events

    • CloudWatch Event Rules goes into an SNS topic

  • Triggers vs Notifications

256835a810401ac552e00d059da03e70.png

CodePipeline

  • Continuous Delivery

  • Visual Workflow

  • Source: GitHub / CodeCommit / S3

  • Build: CodeBuild / Jenkins / etc

  • Load Testing: 3rd party tools

  • Deploy: AWS CodeDeploy / BeanStalk / CloudFormation / ECS

  • Made of stages

    • Each stage can have sequential actions and/or parallel actions

    • Stage examples: Build / Test / Deploy / LoadTest / etc

    • Manual approval can de defined at any stage

CodePipeline Artifacts

  • Each pipeline stage can create 'Artifacts'

  • Artifacts are passed and stored in S3 and on to the next stage

1c06ff4870adb84f170e07e56cbb078a.png

CodePipline Troubleshooting

  • CodePipeline state changes happen in AWS CloudWatch Events which can in return create SNS notifications

    • ex: you can create events for failed pipelines

    • ex: you can create events for cancelled stages

  • If CodePipeline failes a stage you pipeline stops and you can get information in the console

  • AWS CloudTrail can be used to audit AWS API calls

  • If Pipeline can't perform an action, make sure the "IAM Service Role" attached has enough permissions (IAM Policy)

  • Pipeline stages can have multiple action groups

  • Can have sequential and parallel action groups

Codebuild

  • FUlly managed build server, alternative to others like Jenkins

  • Continuous scaling (no servers to manage, no build queue)

  • Pay for usage: the time it takes to complete the builds

  • Leverages Docker under the hood for reproducible builds

  • Possibility to extend capablities leveraging our own base Docker images (??)

  • Secure: Integration with KMS for encryption of build artifacts, IAM for build permissions, and VPC for network security, ClouTrail for API calls logging

CodeBuild Overview

  • Source Code from GitHub / CodeCommit / CodePipeline / S3 etc

  • Buidl instructions can be defined in code (buildspec.yml file)

  • Output logs to Amazon S3 & AWS CloudWatch Logs (go look, find)

  • Metrics to monitor CodeBuild Statistics

  • Use CloudWatch Alarms to detect failed builds and trigger notifications

  • CloudWatch Events / AWS Lambda as a Glue

  • SNS notifications

  • Ability to reproduce CodeBuild locally to troubleshoot in case of errors

    • In case of trouble shooting beyond available logs

    • Install Docker on desktop

    • Leverage CodeBuild Agent

  • Builds can be defined within CodePipeline or CodeBuild itself

  • Java, Ruby, Python, Go, Node,js, Android, .NET Core, PHP, Docker to extend any environment you like (fully extensible)

52be0046c3ed51b603dcc4944601a917.png

Buildspec.yml

  • Must be at root of code

  • Define environment variables

    • Plaintext variables

    • Secure secrets: use SSM Parameter Store

  • Phases (specify commands to run):

    • Install: install dependencies you may need for your build

    • Pre build: final commands to execute before build

    • Build: actual build commands

    • Post build: finishing touches (zip output for example)

  • Artifacts: what to uplaod to S3 (encrypted with KMS)

  • Cache: Files to cache (usually dependencies) to S3 for future build speedup

CodeDeploy

  • Each EC2 or On-Prem server must be running the CodeDeploy Agent

  • The agent is continually polling AWS CodeDeploy for work to do

  • CodeDeploy send appspec.yml file

  • Application is pulled from GitHub or S3

  • EC2 will run the deployment instructions

  • CodeDeploy Agent will report on the success or failure of deployment on the instance

69c65098a334cfbcf013272838705970.png
  • EC2 instances are grouped by deployment group (de/test/prod)

  • Lots of flexibility to define any type of deployment

  • CodeDeploy can be chained into CodePipeline and use artifacts from there

  • CodeDeploy can re-use existing setup tools, works with any application, auto-scaling integration

  • Note: Blue / Green only works with EC2, not on prem

  • Support for AWS Lambda deployments

  • CodeDeploy does not provision resources

  • Only In-Place and Blue/Green deployment types

Primary Components (don't need to memorize)

  • Application: unique name

  • Compute Platform: EC2/On-prem or Lambda

  • Deployment configuration: Deployment rules for success/failure

    • EC2/On-Prem: you can specify the minimum number of healthy instances for the deployment

    • AWS Lambda: specify how traffic is routed to your updated Lambda function versions

  • Deployment Group: Group of tagged instances (allows to deploy gradually)

  • Deployment Type: In-place deployment or blue/green deployment

  • IAM instance profile: need to give EC2 the permissions to pull from S3 / GitHub

  • Application Revision: application code + appspec.yml

  • Service role: Role for CdeDeploy to perform what it needs

  • Target revision: Target deployment application version

CodeDeploy AppSpec.yml (must know for exam)(in root of app source code)

  • File section: how to source and copy from S3 / GitHub to filesystem

  • Hooks: set of instructions to do to deploy the new version (hooks can have timeouts), can run in these steps. Remember the order :

    • ApplicationStop

    • DownloadBundle

    • BeforeInstall

    • AfterInstall

    • ApplicationStart

    • ValidateService (quite important)

Deployment Config

  • Configs

    • Once at a time: one instance at a time, one isntance fails -> deployment stops

    • Half at a time: 50%

    • All at once: quick but no healthy host = downtime. Good for Dev.

    • Custom: ex.: min healthy host = 75%

  • Failures:

    • Instances stay in "failed state"

    • New Deployments will first be deployed to "failed state" instances

    • To rollback: redeploy old deployment or enable automated rollback for failures

  • Deployment Targets:

    • Set of EC2 instances with designated tags

    • Directly to an ASG

    • Mix of ASG / Tags so you can build deployment segments

    • Customization in scripts with DEPLOYMENT_GROUP_NAME environment variable

    • CodeDeploy only deploys to EC2 instances

    • CodeDeploy doesn't require SecurityGroups

AWS CodeStar

  • CodeStar is an integrated solution that regroups: GitHub, CodeCommit, CodeBuild, CodeDeploy, CloudFormation, CodePipeline, CloudWatch

  • Helps quickly create "CICD-readhy" projects for EC2, Lambda, Beanstalk

  • Ability to integrate with Cloud9

  • One dasboard to view all components

  • Free, only pay for underlying

  • Limited customization


CloudFormation

  • Update stack button (in design tool or upload new), preview changes

Resources

  • Core of Template, mandatory

  • Represent components that will be created and configured

  • They are declared and can reference each other

  • Over 224 types, in the form of AWS::aws-product-name::data-type-name

  • Cannot create a dynamic amount of resources (CDK?)

Parameters

  • A Way to provide *inputs to your templates

  • Important if:

    • You want to reuse templates

    • Some inputs cannot be determined ahead of time

  • Extremely powerful, control, and can prevent errors from happening thanks to types

  • Don't need to reupload if you want to change something each time, just use parameters

  • Fn::Ref is leverated to reference parameters, shorthand !Ref

  • PseudoParameters enabled by default:

    • AWS::AcountID

    • AWS::NotificationARNS

    • AWS::NoValue

    • AWS::Region

    • AWS:StackID

    • AWS::StackName

Mappings

  • Mappings are fixed variable within your template, hardcoded

  • They're handy to differentiate between different environments (dev vs prod), regions, AMI types, etc

  • Example:

    • Mappings:

      • Mappin01:

        • Key01:

          • Name: Value01

        • Key02:

          • Name: Value02

      • RegionMAP:

        • us-east-1:

          • "32" : "ami-641120d"

          • "64" : "ami-1241212"

        • us-east-2:

          • "32" : "ami-1231231"

  • Good when you know in advance all the values that could be entered, and they can be deduced from variables such as region, AZ, Account, etc

  • Safer control over the template

  • Only use Parameters when the values are user specific and have to be hand entered

  • Use Fn::FindInMap to return a named value from a specific key

    • **!FindInMap [ MapName, TopLevelKey, SecondLevelKey ]

    • !FindInMap [RegionMap, !Ref "AWS::Region", 32]

Outputs

  • Declares optional output values that we can import into other stacks (if you export them first)

  • Useful, for example, if you define a network CloudFormation and output the variables such as VPC ID and your Subnets IDs

  • BEst way to perform collaboration cross stack, as you let each expert handle their own part of the stack

  • Can't delete a stack if its outputs are being referenced by another stack

Conditions

  • Used to control the creation of resources or outputs basedon a condition

  • Each condition can reference another condition, parameter value, or mapping

  • Ex:

    • Conditions:

      • CreateProdResources: !Equals [ !Ref EnvType, prod]

  • Conditions can be applied to resources, outputs, etc

    • Ex:

      • Resources:

        • MountPoint:

          • Type: "AWS::EC2::VolumeAttachment"

          • Condition: CreateProdResources

Intrinsic Functions

  • Fn::Ref = !Ref

    • Parameters -> returns the value of the parameter

    • Resources -> returns the physcail ID of the underlying resource

  • Fn::GetAtt = !GetAtt

    • Attributes can be attached to any resource you create (see docs)

  • Fn::FindInMap = !FindMap

    • !FindInMap [ MapName, TopLevelKey, SecondLevelKey]

  • Fn::ImportValue = !ImportValue

    • Import values that have been exported from toher templates

  • Fn::Join

    • Join values with a delimiter

    • !Join [ delimiter, [ comma-delimited list of values ] ]

    • A🅱️c = !Join [ ":", [ a, b, c ] ]

  • Fn::Sub = !Sub

    • Subsitute variables in a text, can combine with References or pseudovariables. Must contain ${VariableName}

    • !Sub

    • -- String

    • -- {var1name: var1value, var2name: var2value }

  • Condition Functions (if not equals or and)

Rollback on failures

  • Stack Creation fails: (CreateStack API) - Stack Creation Options

    • Default: everything rolls back (gets deleted)

      • OnFailure=ROLLBACK

    • Troubleshoot: Option to disable rollback to manually troubleshoot

      • OnFailure=DO_NOTHING

    • Delete: Get rid of stack entirely, don't keep anything

      • OnFailure=DELETE

  • Stack Update Failes: (UpdateStack API)

    • The stack automatically rolls back to the previous known working state

    • Ability to in logs what happened


AWS Monitoring, Troubleshooting, and Auditing

CloudWatch Metrics

  • CloudWatch provides metrics for every service in AWS

  • Metric is a variable to monitor (CPUUtilization, NetworkIn)

  • Metrics belong to namespaces

  • Dimension is an attribute of a metric (instance id, environment, etc)

  • Up to 10 dimensions per metric

  • Metrics have timestamps

  • Can create a CloudWatch dashboard of metrics

EC2 Detailed Monitoring

  • EC2 instance metrics have metrics every 5 minutes

  • With detailed monitoring you get data every 1 minute, for a cost

  • Free Tier allows uto have 10 detailed monitoring metrics

Custom Metrics

  • Possiblity to define and send your own custom metrics tso CloudWatch

  • Ability to use dimensions (attributes) to segment metrics

    • Instance.id

    • Environment.name

  • Metric resolution:

    • Standard: 1 minute

    • High resolution: up to 1 second (StorageResolution API parameter), for higher cost

  • Use API call PutMetricData

  • Use exponential backoff in case of throttle errors

  • By default in ASG Group Metric collection is not enabled

CloudWatch Alarms

  • Alarms are used to trigger notifications for any metric

  • Alarms can go to Auto Scaling, EC2 Actions, SNS Notifications

  • Various options (sampling, %, max, min, etc)

  • Alarm States:

    • OK

    • INSUFFICIENT_DATA

    • ALARM

  • Period:

    • Length of time in seconds to evalute the metric

    • High resolution custom metrics: can only choose 10 sec or 30 sec

  • Alarm Targets (exam)

    • Stop, Terminate, Reboot, or Recover an EC2 instance

    • Trigger autoscaling action

    • Send notificatin to SNS (from which you can do almost anything)

  • Good to know

    • Alarms can be created based on CloudWatch Logs Metrics Filters

    • CloudWatch doesn't test or validate the actions that are assigned

    • To test alarms and notifications, set the alarm state to Alarm using CLI

      • aws cloudwatch set-alarm-state --alarm-name "myalarm" --state-value ALARM --state-reason "testing purposes"

CloudWatch Events

  • Source + Rule -> Target

  • Schedule: Like a cron job (same format)

  • Event Pattern: Event rules to react to a service doing something (Ex: CodePipeline state changes)

  • Triggers to Lambda functions, SQS/SNS/Kinesis Messages

  • CloudWatch Event creates a small JSON document to give info on the change

CloudWatch Logs

  • Applications can send logs to CloudWatch via the SDK

  • CloudWatch can collect logs from:

    • Elastic Beanstalk: Collects from application

    • ECS: Colelcts from containers

    • Lambda: Collects from functions

    • VPC Flow Logs

    • API Gateway

    • CloudTrail based on filter

    • CloudWatch Logs Agents: For example on EC2 machines

    • Route53: Logs DNS queries

  • CloudWatch logs can go to:

    • Batch exporter to S3 for archival

    • Stream to ElasticSearch cluster for further analytics

  • Never expire by default

  • CloudWatch Logs can use filter expressions

  • Log storage architecture:

    • Logs groups: Arbitrary name, usually representing an application

    • Log Stream: intances within application / log files / containers

  • Can define log expiration policies

  • Using the CLI we can tal CloudWatch Logs

  • To send logs to CloudWatch, make sure IAM permissions are correct!

  • Security: encryption of logs using KMS at the Group level

X-Ray (exam)

  • Debugging in Production, the good old way:

    • Test Locally

    • Add log statements everywhere

    • Re-deploy in production

  • Log formats differ across application using CloudWatch and analytics is hard

  • Debugging: monolith "easy", distributed services "hard"

  • No common view of your entire architecture

  • X-Ray gives a visual analysis of our applications

  • Troubleshooting performance (bottlenecks)

  • Understand dependencies in a microservice architecture

  • Pinpoint service issues

  • Review request behaviour

  • Find errors and exceptions

  • Are we meeting time SLA?

  • Where am I throttled?

  • Identify users that are impacted

  • Compatbility

    • Malbda

    • Beanstalk

    • ECS

    • ELB

    • API GW

    • EC2 instances or any application server (even on-prem)

  • X-Ray Leverages Tracing

    • Tracing is aend to end way to follow a "request"

    • Each component dealing with the request adds its own "trace"

    • Tracing is made of segments (+ sub-segments)

    • Annotations can be added to traces to provide extra-information

    • Ability to trace

      • Every request

      • Sample of requests (as a % for example or a rate per minute)

    • Security

      • IAM for authorization

      • KMS for encryption at rest

How to enable?

  1. Your code (Java, Python, Go, Node.js, .NET) must import the X-ray SDK

  • Very little code modification needed

  • The application SDK will then caputre:

    • Calls to AWS services

    • HTTP/HTTPS requests

    • Database Calls (MySQL, PostgreSQL, DynamoDB)

    • Queue calls (SQS)

  1. Install the X-Ray daemon or enable X-Ray AWS Integration

  • X-Ray Daemon works as a low level UDP packet interceptor (Win, Lin, Mac)

  • AWS Lambda / other services already run the daemon for you (done via .ebextensions/xray-daemon.cfg for beanstalk)

  • Each application must have the IAM rights to write data to X-Ray

ab811924ac8e6ec48d614d1111415251.png

  • X-Ray service collects data from all the different services

  • Visual Service map is computed from all the segments and traces

  • X-Ray is graphical, so even on technical people can help troubleshoot

Troubleshooting

  • If X-Ray is not working on EC2

    • Ensure the EC2 IAM Role has the proper permissions

    • Ensure the EC2 instance is running the X-Ray Daemon

  • To enable on AWS Lambda

    • Ensure it has an IAM execution role with proper policy (AWSX-RayWriteOnlyAccess)

    • Ensure that X-Ray is imported in the code

X-Ray Additional Exam Tips

  • The X-Ray daemon/agent (which must be configured) has a config to send traces cross account:

    • Make sure the IAM permissions are correct - the agent will assume the role

    • This allows to have a central account for all your application tracing

  • Segments: each application / service sends its own segment

  • Trace: segments collected together to form an end-to-end trace

  • Sampling: decrease the amount of the requests sent to X-Ray, reduce cost or stop flooding

  • Annotations: Key Value pairs used to index traces and use with filters to be able to search through them (filter based on key/value pair)

  • Metadata: Key Value pairs not indexed, not used for searching

  • Code must be instrumented to use the AWS X-Ray SDK (interceptors, handlers, http clients)

  • IAM role must be correct to send traces to X-Ray

  • X-Ray on EC2 / On-Prem:

    • Linux system must run the X-Ray daemon

    • IAM instance role if EC2, other AWS credentials for on-prem instance

  • X-Ray on Lambda:

    • Make sure X-Ray integration is ticked on Lambda (Lambda runs the daemon)

    • IAM role is Lambda role

  • X-Ray on Beanstalk:

    • Set configuration on EB console

    • Or use a beanstalk extenstion (.ebextensions/xray-daemon.config)

  • X-Ray on ECS / EKS / Fargate (Docker):

    • Create a Docker image that runs the Daemon / or use the official X-Ray Docker Image

    • Ensure port mapping network settings are correct and IAM task roles are defined

CloudTrail

  • Provides governance, compliance, and audit for your AWS account

  • Get a history of events / API calls made by Console, SDK, CLI, Services

  • Enabled by default

  • Can put logs from CloudTrail into CloudWatch logs

CloudTrail vs CloudWatch vs X-Ray

  • CloudTrail

    • Audit API calls made by users / services / AWS Console

    • Useful to detect unauthorized calls or root cause of changes

  • CloudWatch

    • CloudWatch Metrics over time for monitoring

    • CloudWatch Logs for storing application logs

    • CloudWatch Alarm to send notifications in case of unexpected metrics

  • X-Ray

    • Automated Trace Analysis and Central Service Map Visualization

    • Latency, Errors, and Fault analysis

    • Request tracking acroos distributed systems

AWS Integration and Messaging

  • Synchronous communications

  • Asynchronous communications

AWS SQS

  • Producers (send) -> Queue -> Consumers (poll)

  • Scales from 1 message per second to 10,000s per second (nearly unlimited)

  • Default retention of messages: 4 days, maximum of 14 days

  • No limit to how many messages in queue

  • Low latency (<10ms on publish and receive)

  • Horizontal scaling in terms of numbers of consumers

  • **Can have duplicate messages occasionally (at least once delivery)

  • Can have out of order messages (best effort)

  • Limitation of 25KB per message

SQS Delay Queue

  • Delay a message (consumers don't see it immediately) up to 15 minutes

  • Default is 0 seconds

  • Can set a default at queue level

  • Can override the default using the DelaySeconds parameter

SQS Producing Messages

  • Define Body (up to 25KB string)

  • Add message attributes (metadata, optional)

  • Provide Delay Delivery (optional)

  • Get Back

    • Message identifier

    • MD5 hash of body

SQS Consuming Messages

  • Consumers Poll SQS for messages (receive up to 10 messages at a time)

  • Process the message within the visibility timeout

  • Delete the message using the message ID and receipt handle

SQS Visibility Timeout

  • When a consumer polls a message from a queue, the message is "invisible" to other consumers for the *Visbility Timeout period

    • Set between 0 seconds and 12 hours (default 30 seconds)

    • If too high (15 minutes) and consumer fails to process the message you must wait a long time before processing the message again

    • If too low (30 seconds) and consumer needs time to process the messafge (2 minutes) another consumer will receive the essage and it will be processed more than once

  • ChangeMessageVisiblity API to change the visiblity while processing the message (consumer does)

  • DeleteMessage API to tell SQS the message was successfully processed

SQS Dead Letter Queue

  • If a consumer fails to process a message within the Visibility Timeout the message goes back to the queue

  • We can set a threshold of how many time it can go back, the redrive policy

  • After the threshold is exceeded the message goes in to a Dead Letter Queue (DLQ)

  • We have to create a DQL first, and then designate it dead letter queue

  • Make sure to process messages in the DLQ before they expire

SQS Long Polling

  • When a consumer requests a message from the queue it can optionally wait for messages to arrive if there are none in the queue

  • This is called Long Polling

  • LongPolling decreases the nu ber of API calls made to SQS while increasing the efficiency and latency of your application

  • The wait time can be between 1 sec to 20 sec (20 sec preferable)

  • Long polling is preferable to Short Polling

  • Long Polling can be enabled at the queue level or at the API level using WaitTimeSeconds

  • Receive Message Wait Time is name in console

SQS FIFO Queue

  • First in First out

  • Name of queue must end in .fifo

  • Lower throughput, 300 msg/s or 3000 msg/s in batch

  • Messages are processed in order by the consumer

  • Messages are sent exactly once

  • No per message delay (or fifo is broken)

FIFO Features

  • Deduplication (to not send same msg twice):

    • Provide a MessageDeduplicationId with your message

    • De-duplication interval is 5 minutes

    • Content based deduplication: the MessageDuplicationId is generated as teh SHA-256 hash of the message body (not the attributes)

  • Sequencing:

    • to ensure strict ordering between messages you msut specify a MessageGroupId

    • Messages with different GroupId may be received out of order

    • Eg. to order messages for a user you could use the "user_id" as a group id

    • Messages with the same Group ID are delivered to one consumer at a time

Last updated

Was this helpful?