The term AWS Auto Scaling can quickly get confusing for a multitude of reasons. Is it different from AWS Elastic Compute Cloud (EC2) Auto Scaling? Why is it a big deal in the first place — isn’t autoscaling the premise on which cloud computing is built? These are questions that could be worrisome, even for experienced AWS practitioners.
In this comprehensive guide, we break down everything AWS Auto Scaling using a top-down approach. The goal is to give you a good understanding of the topic. This will make you feel much more comfortable thinking about the solution you’re looking to design, implement, or even just discuss.
What We’ll Cover
- What is Auto Scaling?
- AWS Auto Scaling versus EC2 Auto Scaling
- How EC2 Auto Scaling works
- How AWS Auto Scaling works
- Hands-on Demo
- Tips to keep in mind
What Is Autoscaling?
Literally, autoscaling means to scale in an automated way. And this, in my opinion, is what completes the promise of cloud computing and makes it a formidable tool for businesses today. Let me explain this point of view.
With cloud computing, you have access to computing services (e.g., servers, storage, databases, networks, software, analytics, intelligence) over the internet that you can deploy at any time. Let’s say you initially launched your application with a 2G RAM instance and then today, you need an extra 4G RAM instance to handle more traffic. With the cloud, you get that instantly and only start paying for the extra resource once it’s in use. If the traffic to your application goes back down, you can choose to release the extra 4G RAM server and you’re back to paying just for your initial 2G of RAM.
This alone is genius. But you still have to (or at least engineers have to) handle adding and removing all of the resources at the right time. This scaling up and down is what autoscaling is about, with the convenience of it happening automatically.
For clarity, I’ll use the word “instance” from time to time simply as another way of referring to a server.
AWS Auto Scaling vs. EC2 Auto Scaling
In my experience, within the context of AWS, many engineers regard autoscaling as EC2 Auto Scaling, which may or may not be your case. This is incorrect and we need to have a clear understanding of the difference between AWS Auto Scaling and EC2 Auto Scaling going forward.
AWS Auto Scaling
This is an AWS service (just like S3 or IAM) that provides a one-stop-shop for everything autoscaling. By saying this, I mean that AWS Auto Scaling provides a unified interface where you can manage the automatic scaling of your resources across multiple other AWS services. The service accomplishes this by monitoring your applications and adjusting the capacity of the resources on which they run to optimize for cost, performance, or both. That way, you get just what you need in terms of capacity when you need it.
The services AWS Auto Scaling supports include Amazon EC2 instances and Spot Fleets, Amazon ECS, Amazon DynamoDB, and Amazon Aurora. No other services can leverage AWS Auto Scaling.
EC2 Auto Scaling
EC2 Auto Scaling, on the other hand, is an EC2 feature that performs a very similar function to AWS Auto Scaling but differs primarily in two aspects:
- it is specific to the EC2 service, and
- we scale up or down by adding or removing EC2 instances.
Right away, you can see that you have the choice between using the EC2 Auto Scaling feature and AWS Auto Scaling when dealing specifically with EC2.
How EC2 Auto Scaling Works
Before talking about how it all works, it’s important to throw in another term — an Auto Scaling group. This is a logical grouping of EC2 instances for the purpose of autoscaling.
The EC2 Auto Scaling feature works by checking the health of instances within a given Auto Scaling group. When an instance is faulty or cannot handle the traffic it’s receiving, it terminates the instance and then deploys another in its place with the help of predefined configurations. These configurations are bundled in a launch template that’s set up when creating the Auto Scaling group.
A launch template typically specifies information, like the Amazon machine image to use, instance type, key pairs, security groups, and more, that completely define the instances that can be launched.
That’s quite a few concepts to wrap your head around, but hopefully, it all makes sense. The question that arises is how EC2 Auto Scaling handles the traffic to an application hosted on multiple instances. And thankfully, AWS offers load balancers that happen to be designed specifically for situations like this.
AWS allows users to choose from four types of load balancers. Typically, these load balancers sit in front of the Auto Scaling group to receive network traffic from clients, and then distribute it across the instances in the group they’re serving. Most often, application load balancing works great, but you always want to make sure you pick the right type of load balancer for your individual use case.
An Important Note
- In case you come across them, Launch configurations serve the same purpose as Launch Templates, but the latter offer more flexibility. So whenever possible, use launch templates instead.
- Configuration of EC2 Auto Scaling with launch templates, launch configurations, and so on can quickly become a nightmare when your infrastructure is very large. In situations with a huge infrastructure, a third-party solution like Scalyr that integrates with AWS could be a good fit. They’re multitenant and so can pick the right instances and scale these instances up and down to deliver services as cost-efficiently as possible. It’s just easier for them since they operate at scale.
Let’s now have a look at AWS Auto Scaling.
How AWS Autoscaling Works
If your application makes use of any AWS Auto Scaling compatible services, then you can autoscale it directly from the unified interface. As an overview, the Auto Scaling interface allows you to scan your applications to detect compatible resources, choose what to optimize for (cost, performance, or both), and then you can sit back and relax while AWS Auto Scaling does the magic for you.
To better comprehend everything, we must, first of all, understand the core of this service: scaling plans. These are the governing instructions for how to autoscale a particular resource(s). And right at the center of every resource scaling plan is a scaling strategy. This component specifies the basis for optimization, whether cost, performance, or a balance between both.
To bring it all together, consider a small startup running their minimum viable product (MVP) with the help of EC2 in an Auto Scaling group. This is not an ideal example, as we will see later in this post, but let’s use it to clarify the new terminology. Since AWS Auto Scaling supports EC2, the engineer can manage their application capacity automatically using AWS Auto Scaling. To do so, she creates a scaling plan for the application’s Auto Scaling group and since the startup is still trying to raise funds, she sets the scaling strategy to optimization for cost. With this set, she can just monitor while AWS uses dynamic and predictive scaling to enforce the company’s cost optimization scaling strategy.
But what are dynamic and predictive scaling? No worries, we explain these in the following paragraphs.
With dynamic scaling, the capacity of a resource is optimized by a target metric defined in a scaling policy. And a scaling policy in itself is part of a scaling plan. A commonly used metric is CPU usage. For example, you can configure your scaling plan to keep CPU usage of all instances in an Auto Scaling group below 80%. If the CPU utilization goes above 80%, this triggers the addition of another instance to help with the increased CPU usage.
Predictive scaling optimizes for a metric specified in the scaling plan as well. However, it differs from dynamic scaling in that it uses machine learning techniques to forecast capacity requirements ahead of time, and then carries out the necessary adjustments to meet those demands. This more proactive approach is particularly interesting because application performance stays constant. For example, let’s say your application has traffic spikes on Fridays because people are shopping for the weekend. In this case, the scaling plan would carry out scheduled scaling adjustments to ensure your application is ready to handle the spike every time.
AWS Auto Scaling and EC2 Auto Scaling are both free to use. Nonetheless, you pay for the resources used to run your applications. Thus, in the case of our startup, they pay for their EC2 usage and the AWS Auto Scaling service is free.
AWS Auto Scaling Hands-On
For the hands-on section, we create an Auto Scaling group and then use this group to explore the AWS Auto Scaling service.
Creating an Auto Scaling Group
From your AWS homepage, navigate to the EC2 dashboard. If you can’t access EC2, use the search bar to find the service. From there, you can access the Auto Scaling groups feature from the left sidebar.
If this is your first Auto Scaling group, you’ll be greeted with a page similar to the image below. If not, you should still be able to find a button to create a new Auto Scaling group.
The image below shows what the creation form looks like. Give a unique name to your group and then select Launch template. If you don’t have an existing Launch template, you can easily create one by clicking the Create a launch template. When everything is set, click Next.
Again, remember that you can always go with a launch configuration instead of a launch template if you’re more comfortable with that.
On the next tab, I recommend selecting all the subnets (see figure below), just in case we don’t know what subnet a new instance will launched into.
Next, we configure Load balancing. It’s always recommended to have a load balancer attached to your Auto Scaling group, unless for some very specific reason you don’t need one. If you have no existing target groups, you’ll have the option to create one, and then you can come back to follow along with our tutorial.
The Health check grace period of 300s is great (see bottom of image below), but depending on the sensitivity of your application, you could decide to tweak this. When done, click Next to proceed.
On the next tab, designate the minimum and maximum number of instances for your Auto Scaling group. Once those are set, you need to select a metric that will act as a trigger for scaling your instance number up or down.
At this point, I suggest going directly to review by clicking the Skip to Review button. There are two additional optional steps, which we’ll skip for now, but feel free to go through them if need be. When your review is done, finish creating your new Auto Scaling group by clicking on Create an Auto Scaling group (see image below).
Creating a Scaling Plan in AWS Auto Scaling
We mentioned earlier that a scaling policy is part of your scaling plan. We’ll create a scaling plan now, and implicitly, a scaling policy as well. From your AWS homepage, navigate to the AWS Auto Scaling dashboard. If this is your first Auto Scaling plan, a page like this greets you. If not, you should still see a Get Started button. Simply click to start the process.
You can use an AWS CloudFormation stack, a set of tags, or Auto Scaling groups to create your scaling plan. Since we already created an Auto Scaling group for this purpose, let’s select that group and then click Next.
On the next tab, we can see that we are setting up our scaling plan. Play around with the configurations to get a good feel for the possibilities. When you’re ready, click Next to proceed.
There are optional review steps, but we’ll skip those for now. Finally, create your scaling plan for the Auto Scaling group. And there you have it — an Auto Scaling group completely managed by AWS Auto Scaling on your behalf.
Tips to Keep in Mind
Now we have a complete understanding of autoscaling in AWS. Here are a few tips I suggest you keep in mind going forward.
- Use AWS EC2 Auto Scaling instead of AWS Autoscaling if your infrastructure uses only the EC2 service and you want to scale your EC2 Auto Scaling groups. Also, as we saw in the hands-on section, this is where you can create and delete Autoscaling groups.
- Use AWS Auto Scaling for scaling resources across multiple services with the help of defined scaling strategies.
- AWS Auto Scaling only supports specific services. Hence, you can only make use of it if your application uses at least one service it supports.
- AWS proposes scaling strategies on a per resource basis, so if you’re not sure whether to optimize for cost or performance, follow the recommended strategy.
- Applications that experience weekly or daily variations in traffic are a great use case for AWS Auto Scaling. These are situations in which predictive scaling thrives.
- The use of an elastic load balancer with an Auto Scaling group is not mandatory, but highly recommended.
- Auto Scaling groups can be monitored via AWS Auto Scaling, but you create them via the EC2 console. When creating these groups, you need a launch template or a launch configuration that describes the instances you want launched in the group. Use launch templates over launch configurations whenever possible.
- AWS Auto Scaling isn’t available in all regions at the time of writing this post. So when planning to make use of AWS Auto Scaling, be sure to verify the service is available in your region.
This covers everything we set out to elaborate on in this guide. Of course, like with every other service on AWS, practice makes perfect. Don’t hesitate to go over things again, where necessary. More importantly, try them out yourself for a more immersive learning experience.
This post was written by Boris Bambo. Boris is a data & machine learning engineer fascinated by technology, education, and business. Feel free to connect with him on LinkedIn.