How Auto Scaling Works in AWS

Understanding vertical and horizontal scaling patterns in AWS, including scheduled scaling, target tracking, and more.

I've seen most individuals hit their first scaling problem when their micro / small instance starts to choke under load. There are two ways to deal with that:

Vertical Scaling (Scaling Up)

You upgrade the instance type from t3.micro to t3.small, t3.medium, and eventually to larger classes like m5.large.

This gives you more CPU, memory, and network throughput in the same machine without changing your app setup. It works well until you reach the upper limit of what one instance can handle.

Horizontal Scaling (Scaling Out)

You launch more instances of the same type, like multiple m5.large, and put them behind a load balancer.

Each instance handles a portion of the load. This helps scale linearly with demand, and is suited for workloads designed to run across many nodes. It needs distributed state handling and better traffic management.

Here are the four key cloud scaling patterns you'll come across in production:

1. Scheduled Scaling

This pattern is used when the workload follows a fixed, known schedule. It's common in enterprise environments where traffic is driven by office hours, or in systems that run batch jobs, reports, or data sync at fixed times.

Implementation is straightforward. In AWS, scheduled scaling can be configured on Auto Scaling Groups using put-scheduled-update-group-action via CLI or with Terraform using aws_autoscaling_schedule. In real setups, you create two separate scheduled actions:

One for scaling out before the expected load spike:

aws autoscaling put-scheduled-update-group-action \
  --auto-scaling-group-name report-runner-asg \
  --scheduled-action-name morning-scale-out \
  --start-time "2026-01-02T00:15:00Z" \
  --desired-capacity 3

One for scaling in after the load drops:

aws autoscaling put-scheduled-update-group-action \
  --auto-scaling-group-name report-runner-asg \
  --scheduled-action-name evening-scale-in \
  --start-time "2026-01-02T13:30:00Z" \
  --desired-capacity 1

Things to Watch:

Scheduled actions run in UTC, so always convert from your local timezone.

2. Target Tracking Scaling

This scaling pattern keeps a specific metric, like average CPU utilization, close to a target value. The Auto Scaling Group continuously adjusts the desired instance count based on this metric. When the metric crosses the threshold, the group either adds or removes instances to bring it back to target.

To use this in production, typically you need:

An Auto Scaling Group with a working Launch Template.

Example: maintaining CPU at 50 percent across all active EC2 instances. Here's how you configure it using AWS CLI:

aws autoscaling put-scaling-policy \
  --auto-scaling-group-name app-asg \
  --policy-name cpu-50-target-tracking \
  --policy-type TargetTrackingScaling \
  --target-tracking-configuration '{
    "TargetValue": 50.0,
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "ASGAverageCPUUtilization"
    },
    "ScaleInCooldown": 120,
    "ScaleOutCooldown": 60
  }'

Things to Watch:

Keep a minimum of 2 instances to avoid full scale in during low traffic.

These two patterns are typical ones. Let's move on to Step and Predictive Scaling patterns, which are a bit more complex and have seen increased adoption in real world setups.

Read the full article on Hashnode