Building Highly Available AWS Infrastructure: ALB, Auto Scaling, and EC2 - Part 1

Master the fundamentals of Application Load Balancers, Auto Scaling Groups, and EC2 instances. Learn how to configure health checks, achieve high availability, and avoid those dreaded 500 errors in your production environment.

Building Highly Available AWS Infrastructure: ALB, Auto Scaling, and EC2 - Part 1

You've deployed your web application to AWS. Everything works great... until it doesn't. A sudden traffic spike hits, your single EC2 instance can't handle the load, and your users are greeted with timeout errors and 500s. Sound familiar?

Welcome to the world of high availability architecture, where we need more than just a single server hoping for the best. In this three-part series, we'll explore how to build truly resilient infrastructure on AWS, starting with the foundational trio: Application Load Balancers (ALBs), Auto Scaling Groups, and EC2 instances.

This is Part 1, where we'll cover the fundamentals and get you from zero to a production-ready, highly available setup.

🎯 The Big Picture: What Are We Building?

Before diving into the details, let's understand what each component does and why you need all three:

  • Application Load Balancer (ALB): Your traffic cop, distributing incoming requests across multiple servers
  • Auto Scaling Group (ASG): Your capacity manager, automatically adding or removing servers based on demand
  • EC2 Instances: Your actual compute resources running your application

Together, these three components create a self-healing, self-scaling infrastructure that can handle traffic spikes, server failures, and everything in between.

🚦 How Does an Application Load Balancer Work?

An Application Load Balancer sits between your users and your application servers. When a request comes in, the ALB:

  1. Receives the request on port 80 (HTTP) or 443 (HTTPS)
  2. Checks which targets are healthy in its target group
  3. Routes the request to one of the healthy targets using its routing algorithm (typically round-robin)
  4. Returns the response back to the user

Key ALB Concepts

Listeners: These define what ports and protocols your ALB accepts traffic on. Common setup:

  • Listener on port 80 β†’ Redirects to HTTPS
  • Listener on port 443 β†’ Forwards to your target group

SSL Termination: One of the ALB's most powerful features is SSL/TLS termination. This means the ALB handles the HTTPS encryption/decryption, not your application servers.

Here's what happens:

  1. Client makes HTTPS request (encrypted) to ALB on port 443
  2. ALB decrypts the request using the SSL certificate
  3. ALB forwards the request to your targets over HTTP (unencrypted) on port 8080
  4. Your application responds with plain HTTP
  5. ALB encrypts the response and sends it back to the client as HTTPS

Benefits of SSL termination at the ALB:

  • Simplified certificate management: Install and renew certificates in one place (the ALB), not on every instance
  • Reduced instance CPU load: SSL encryption/decryption is computationally expensive; offload it to the ALB
  • Easier instance management: Your application code doesn't need to handle SSL certificates
  • Automatic certificate renewal: Use AWS Certificate Manager (ACM) for free SSL certificates with auto-renewal

End-to-End Encryption: If you need encryption all the way to your instances (for compliance or security requirements), you can configure:

  • HTTPS listener on the ALB (port 443)
  • HTTPS target group (your instances need SSL certificates too)
  • ALB re-encrypts traffic before forwarding to instances

This is less common but necessary for highly regulated industries where data must be encrypted in transit at all times.

ALB HTTP Headers: The ALB automatically adds useful headers to requests forwarded to your targets:

  • X-Forwarded-For: The original client IP address (since requests come from the ALB's IP)
  • X-Forwarded-Proto: The original protocol used by the client (usually https)
  • X-Forwarded-Port: The original port used by the client (typically 443)

These headers let your application know details about the original request, which is especially useful for logging, security checks, and generating correct redirect URLs.

Target Groups: A logical grouping of targets (EC2 instances, containers, Lambda functions) that receive traffic from the ALB.

Health Checks: The ALB regularly pings your targets to ensure they're responding correctly. If a target fails health checks, the ALB stops sending it traffic.

# Example ALB Configuration Concept
ALB:
  Listeners:
    - Port: 80
      Protocol: HTTP
      DefaultAction: Redirect to HTTPS
    - Port: 443
      Protocol: HTTPS
      SSLCertificate: arn:aws:acm:us-east-1:123456789:certificate/abc-123
      DefaultAction: Forward to TargetGroup
  
  TargetGroup:
    Protocol: HTTP  # Note: HTTP, not HTTPS! (SSL terminated at ALB)
    Port: 5000      # Common port for .NET apps, Node.js, etc.
    HealthCheck:
      Path: /health
      Interval: 30 seconds
      Timeout: 5 seconds
      HealthyThreshold: 2
      UnhealthyThreshold: 3

πŸ”„ How Does an Auto Scaling Group Work?

An Auto Scaling Group (ASG) is like having an operations team that never sleeps. It continuously monitors your application's health and performance, automatically:

  1. Launching new instances when demand increases
  2. Terminating instances when demand decreases
  3. Replacing unhealthy instances with healthy ones
  4. Maintaining your desired capacity even during infrastructure failures

ASG Core Components

Launch Template: Defines what kind of EC2 instances to create (AMI, instance type, security groups, user data script, etc.)

Scaling Policies: Rules that determine when to scale:

  • Target Tracking: "Keep CPU at 70%"
  • Step Scaling: "Add 2 instances when CPU > 80%, add 4 when CPU > 90%"
  • Scheduled Scaling: "Scale up at 9 AM, scale down at 6 PM"

Capacity Settings:

  • Minimum: Never go below this (e.g., 2 for high availability)
  • Desired: Current target capacity
  • Maximum: Never exceed this (cost protection)
# Conceptual Auto Scaling Configuration
AutoScalingGroup:
  MinSize: 2
  DesiredCapacity: 2
  MaxSize: 10
  
  LaunchTemplate:
    ImageId: ami-12345678
    InstanceType: t3.medium
    SecurityGroups: [web-server-sg]
    
  ScalingPolicies:
    - Type: TargetTracking
      Metric: CPUUtilization
      TargetValue: 70

πŸ’» How Does an EC2 Instance Fit In?

EC2 instances are your workhorsesβ€”the actual virtual machines running your application code. In our high availability setup, each EC2 instance:

  1. Runs your application (web server, API, etc.)
  2. Responds to health checks from both the ALB and the ASG
  3. Reports metrics to CloudWatch (CPU, memory, network, custom metrics)
  4. Can be automatically replaced if it becomes unhealthy

The Instance Lifecycle

Launch β†’ Running β†’ Healthy β†’ Serving Traffic β†’ (Eventually) Terminated
          ↓
        Unhealthy β†’ Removed from ALB β†’ Terminated by ASG β†’ New instance launched

πŸ”— Connecting It All Together

Here's where the magic happens. Let's walk through how to connect an ALB to an Auto Scaling Group with EC2 instances.

Step 1: Create Your Target Group

The target group is where your ASG will register instances:

# AWS CLI example
aws elbv2 create-target-group \
  --name my-web-app-targets \
  --protocol HTTP \
  --port 8080 \
  --vpc-id vpc-12345678 \
  --health-check-enabled \
  --health-check-path /health \
  --health-check-interval-seconds 30 \
  --health-check-timeout-seconds 5 \
  --healthy-threshold-count 2 \
  --unhealthy-threshold-count 3

Step 2: Create Your ALB

aws elbv2 create-load-balancer \
  --name my-web-app-alb \
  --subnets subnet-12345 subnet-67890 \
  --security-groups sg-12345678 \
  --scheme internet-facing \
  --type application

Step 3: Create a Listener

aws elbv2 create-listener \
  --load-balancer-arn arn:aws:elasticloadbalancing:... \
  --protocol HTTPS \
  --port 443 \
  --certificates CertificateArn=arn:aws:acm:... \
  --default-actions Type=forward,TargetGroupArn=arn:aws:elasticloadbalancing:...

Step 4: Create Your Auto Scaling Group

The critical pieceβ€”attach the target group to your ASG:

aws autoscaling create-auto-scaling-group \
  --auto-scaling-group-name my-web-app-asg \
  --launch-template LaunchTemplateName=my-web-app-template \
  --min-size 2 \
  --max-size 10 \
  --desired-capacity 2 \
  --target-group-arns arn:aws:elasticloadbalancing:.../targetgroup/my-web-app-targets/... \
  --health-check-type ELB \
  --health-check-grace-period 300 \
  --vpc-zone-identifier "subnet-12345,subnet-67890"

Key point: --target-group-arns connects your ASG to the ALB target group. Any instance launched by the ASG will automatically register with this target group.

⚠️ Instance Protection from Scale-In

When using ECS with EC2 (covered in Part 2), you'll need instance protection:

# CRITICAL: Required for ECS Capacity Providers with managed termination
aws autoscaling update-auto-scaling-group \
  --auto-scaling-group-name my-asg \
  --new-instances-protected-from-scale-in

Why? ECS Capacity Providers need to control instance termination to ensure:

  • Tasks are gracefully stopped before instance termination
  • No tasks are abruptly killed during scale-in
  • Proper draining of connections happens first

Without this, the ASG might terminate instances with running containers, causing service disruption!

πŸ₯ The Health Check Trinity

This is where things get interesting (and where many people get confused). You actually have THREE different health check systems working together:

1. ALB Target Health Checks

Purpose: Determine if a target should receive traffic from the ALB

Configuration:

  • Path: /health (your application's health endpoint)
  • Interval: 30 seconds
  • Timeout: 5 seconds
  • Healthy threshold: 2 consecutive successes
  • Unhealthy threshold: 3 consecutive failures

Behavior: If an instance fails ALB health checks, it's marked "unhealthy" in the target group and stops receiving traffic, but it's NOT terminated.

2. Auto Scaling Group Health Checks

Purpose: Determine if an instance should be terminated and replaced

Types:

  • EC2 Health Check: Is the instance running? (basic AWS infrastructure check)
  • ELB Health Check: Is the instance passing ALB health checks?

Configuration:

--health-check-type ELB  # Use ALB health status
--health-check-grace-period 300  # Wait 5 minutes after launch before checking

Behavior: If an instance fails ASG health checks, it's TERMINATED and a new one is launched.

3. EC2 Status Checks

Purpose: Monitor the underlying infrastructure and instance health

Types:

  • System Status Check: AWS infrastructure (host, network)
  • Instance Status Check: Your instance's operating system and configuration

Behavior: If system checks fail, AWS may automatically recover your instance. If instance checks fail, you may need to intervene or let ASG replace it.

🎭 The Complete Health Check Flow

Let's trace what happens when an instance becomes unhealthy:

1. Application starts failing β†’ Returns 500 errors
                              ↓
2. ALB health check fails β†’ Instance marked unhealthy in target group
                          β†’ ALB stops sending new requests to this instance
                              ↓
3. ASG detects unhealthy target (if using ELB health check type)
                              ↓
4. ASG waits for grace period to expire
                              ↓
5. ASG terminates the unhealthy instance
                              ↓
6. ASG launches a new instance to maintain desired capacity
                              ↓
7. New instance starts, passes health checks, begins receiving traffic

🌐 High Availability Setup: Best Practices

Want to build a truly resilient system? Follow these principles:

1. Multi-AZ Deployment

Why: AWS Availability Zones are isolated data centers. If one goes down, your app stays up.

--vpc-zone-identifier "subnet-in-us-east-1a,subnet-in-us-east-1b,subnet-in-us-east-1c"

Minimum: 2 AZs Recommended: 3 AZs

2. Minimum Instance Count

Never use a minimum of 1. With minimum capacity of 2:

  • You can handle an entire AZ failure
  • You can perform rolling deployments without downtime
  • You have redundancy during instance replacement

3. Load Balancer in Multiple Subnets

Your ALB should also span multiple AZs:

--subnets subnet-in-us-east-1a subnet-in-us-east-1b subnet-in-us-east-1c

4. Proper Health Check Configuration

Health check endpoint must:

  • Check critical dependencies (database, cache, external services)
  • Return quickly (< 1 second)
  • Return 200 OK when healthy, 5xx when unhealthy
# Example health check endpoint in your app
@app.route('/health')
def health_check():
    try:
        # Check database connection
        db.execute('SELECT 1')
        
        # Check cache
        cache.ping()
        
        # Check critical external service
        if not can_reach_payment_api():
            return {'status': 'unhealthy'}, 503
            
        return {'status': 'healthy'}, 200
    except Exception as e:
        return {'status': 'unhealthy', 'error': str(e)}, 503

⚠️ IMPORTANT - Health Check Best Practice:

Be cautious with database checks in health endpoints! ALB checks every 10-30 seconds. If every health check queries the database, you could overwhelm it:

# Better approach: Cached health checks
import time

last_db_check = {'time': 0, 'status': 'unknown'}
DB_CHECK_INTERVAL = 60  # Only check DB every 60 seconds

@app.route('/health')
def health_check():
    current_time = time.time()
    
    # Return cached status if we checked recently
    if current_time - last_db_check['time'] < DB_CHECK_INTERVAL:
        if last_db_check['status'] == 'healthy':
            return {'status': 'healthy', 'cached': True}, 200
        return {'status': 'unhealthy', 'cached': True}, 503
    
    # Actually check database
    try:
        db.execute('SELECT 1')
        last_db_check['time'] = current_time
        last_db_check['status'] = 'healthy'
        return {'status': 'healthy'}, 200
    except Exception as e:
        last_db_check['time'] = current_time
        last_db_check['status'] = 'unhealthy'
        return {'status': 'unhealthy', 'error': str(e)}, 503

Why this matters: With 10 instances and 10-second health check intervals, you'd hit the database 60 times per minute just for health checks! Caching reduces this to once per minute while still detecting failures quickly.

5. Connection Draining / Deregistration Delay

When an instance is deregistered, don't immediately kill active connections:

--deregistration-delay-timeout-seconds 30

This gives in-flight requests 30 seconds to complete before the instance is fully removed.

πŸ“Š Monitoring: Know Before Your Users Do

You can't fix what you can't see. Set up comprehensive monitoring:

Key Metrics to Watch

ALB Metrics (CloudWatch):

  • TargetResponseTime: How fast are your instances responding?
  • HTTPCode_Target_4XX_Count: Client errors
  • HTTPCode_Target_5XX_Count: Server errors (🚨 alert on this!)
  • HealthyHostCount: How many targets are healthy?
  • UnHealthyHostCount: How many targets are failing?

ASG Metrics:

  • GroupDesiredCapacity: What the ASG is trying to maintain
  • GroupInServiceInstances: How many instances are actually running
  • GroupTotalInstances: Total instances (including pending/terminating)

EC2 Metrics:

  • CPUUtilization: Are you scaling appropriately?
  • NetworkIn/Out: Traffic patterns
  • StatusCheckFailed: Infrastructure problems

Setting Up Alerts

# CloudWatch Alarm: Alert on 5XX errors
aws cloudwatch put-metric-alarm \
  --alarm-name high-5xx-errors \
  --alarm-description "Alert when 5XX errors exceed threshold" \
  --metric-name HTTPCode_Target_5XX_Count \
  --namespace AWS/ApplicationELB \
  --statistic Sum \
  --period 60 \
  --evaluation-periods 2 \
  --threshold 10 \
  --comparison-operator GreaterThanThreshold \
  --dimensions Name=LoadBalancer,Value=app/my-web-app-alb/...

πŸ›‘οΈ Avoiding Unexpected Outages

Common Failure Scenarios and Solutions

Scenario 1: All instances fail health checks simultaneously

  • Cause: Bad deployment, database failure, misconfigured health check
  • Solution:
    • Set up staged deployments
    • Use ASG instance refresh with min_healthy_percentage
    • Make health checks graceful during startup (grace period)

Scenario 2: Scaling can't keep up with traffic spike

  • Cause: Scaling policies too conservative, instances too slow to start
  • Solution:
    • Use predictive scaling or scheduled scaling for known traffic patterns
    • Optimize your AMI and user data script for faster startup
    • Consider using warm pools (keep pre-initialized instances ready)

Scenario 3: Instance keeps restarting in a loop

  • Cause: Application fails to start, health check runs before app is ready
  • Solution:
    • Increase health-check-grace-period (gives instances time to fully start)
    • Fix application startup issues
    • Check CloudWatch Logs and EC2 user data logs

🚫 Avoiding Those Ugly 500 Errors

500 errors are the bane of every operations team. Here's how to minimize them:

1. Implement Graceful Shutdown

When an instance receives a termination signal, it should:

  1. Stop accepting new requests
  2. Complete in-flight requests
  3. Shut down cleanly
# Python example with signal handling
import signal
import sys

def graceful_shutdown(signum, frame):
    print("Received termination signal, shutting down gracefully...")
    # Stop accepting new requests
    server.stop_accepting_connections()
    # Wait for active requests to complete
    server.wait_for_active_requests(timeout=30)
    # Exit
    sys.exit(0)

signal.signal(signal.SIGTERM, graceful_shutdown)

2. Use Appropriate Health Check Thresholds

  • Too aggressive: Healthy instances get marked unhealthy during brief hiccups
  • Too lenient: Unhealthy instances keep serving traffic and causing errors

Sweet spot:

  • Interval: 10-30 seconds
  • Unhealthy threshold: 2-3 consecutive failures
  • Healthy threshold: 2 consecutive successes

3. Implement Circuit Breakers

If a dependency is down, don't keep trying to call it:

from pybreaker import CircuitBreaker

# If external service fails 5 times, stop calling it for 60 seconds
breaker = CircuitBreaker(fail_max=5, timeout_duration=60)

@breaker
def call_external_service():
    return requests.get('https://api.example.com/data')

4. Set Proper Timeouts

Don't let slow requests pile up and overwhelm your instances:

# Application timeout configuration
REQUEST_TIMEOUT = 30  # seconds
UPSTREAM_TIMEOUT = 5   # seconds for external calls

# ALB idle timeout (default 60s, adjust based on your needs)
aws elbv2 modify-load-balancer-attributes \
  --load-balancer-arn arn:aws:elasticloadbalancing:... \
  --attributes Key=idle_timeout.timeout_seconds,Value=60

🎬 Putting It All Together: A Real-World Example

Let's say you're building a web API that needs to handle variable traffic throughout the day:

# 1. Create launch template
aws ec2 create-launch-template \
  --launch-template-name web-api-template \
  --launch-template-data '{
    "ImageId": "ami-12345678",
    "InstanceType": "t3.medium",
    "SecurityGroupIds": ["sg-12345678"],
    "UserData": "<base64-encoded-startup-script>",
    "TagSpecifications": [{
      "ResourceType": "instance",
      "Tags": [{"Key": "Name", "Value": "web-api-instance"}]
    }]
  }'

# 2. Create target group
aws elbv2 create-target-group \
  --name web-api-targets \
  --protocol HTTP \
  --port 8080 \
  --vpc-id vpc-12345678 \
  --health-check-path /health \
  --health-check-interval-seconds 30 \
  --healthy-threshold-count 2 \
  --unhealthy-threshold-count 2 \
  --deregistration-delay-timeout-seconds 30

# 3. Create ALB
aws elbv2 create-load-balancer \
  --name web-api-alb \
  --subnets subnet-1a subnet-1b subnet-1c \
  --security-groups sg-alb-12345678

# 4. Create listener
aws elbv2 create-listener \
  --load-balancer-arn <alb-arn> \
  --protocol HTTPS \
  --port 443 \
  --certificates CertificateArn=<acm-cert-arn> \
  --default-actions Type=forward,TargetGroupArn=<target-group-arn>

# 5. Create Auto Scaling Group
aws autoscaling create-auto-scaling-group \
  --auto-scaling-group-name web-api-asg \
  --launch-template LaunchTemplateName=web-api-template \
  --min-size 2 \
  --max-size 10 \
  --desired-capacity 2 \
  --target-group-arns <target-group-arn> \
  --health-check-type ELB \
  --health-check-grace-period 300 \
  --vpc-zone-identifier "subnet-1a,subnet-1b,subnet-1c"

# 6. Create scaling policy
aws autoscaling put-scaling-policy \
  --auto-scaling-group-name web-api-asg \
  --policy-name target-tracking-cpu \
  --policy-type TargetTrackingScaling \
  --target-tracking-configuration '{
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "ASGAverageCPUUtilization"
    },
    "TargetValue": 70.0
  }'

🎯 What's Next?

You now have a solid foundation in building highly available infrastructure with ALB, Auto Scaling Groups, and EC2 instances. But we're just getting started!

In Part 2, we'll level up by introducing Amazon ECS with EC2β€”containerizing your applications and discovering the new challenges (and opportunities) that come with managing both container scaling AND instance scaling.

In Part 3, we'll explore AWS Fargateβ€”the serverless compute engine that eliminates the need to manage EC2 instances entirely, letting you focus purely on your application containers.

In Part 4, we'll tackle graceful failure handlingβ€”because even with perfect infrastructure scaling, dependencies like databases can still fail. Learn how to handle failures gracefully and provide great user experiences during outages.

πŸ“š Quick Reference

Health Check Configuration Checklist

  • βœ… ALB health check path: /health
  • βœ… Health check interval: 10-30 seconds
  • βœ… Unhealthy threshold: 2-3 failures
  • βœ… Healthy threshold: 2 successes
  • βœ… ASG health check type: ELB
  • βœ… ASG grace period: 300+ seconds
  • βœ… Deregistration delay: 30 seconds

High Availability Checklist

  • βœ… Minimum 2 instances
  • βœ… Deployed across 3 AZs
  • βœ… ALB in multiple subnets
  • βœ… Target tracking scaling policy
  • βœ… CloudWatch alarms for 5XX errors
  • βœ… CloudWatch alarms for unhealthy hosts
  • βœ… Graceful shutdown handling
  • βœ… Connection draining enabled

Building reliable infrastructure is a journey, not a destination. Start with these foundations, monitor continuously, and iterate based on what you learn. Your future self (and your users) will thank you!

Stay tuned for Part 2, where we'll dive into the world of containers with Amazon ECS!

Comments

Comments are not available. Feel free to share your feedback on LinkedIn or connect with Geek Cafe.

Geek Cafe LogoGeek Cafe

Your trusted partner for cloud architecture, development, and technical solutions. Let's build something amazing together.

Quick Links

Β© 2025 Geek Cafe LLC. All rights reserved.

Research Triangle Park, North Carolina

Version: 8.9.26