Scalability in Cloud Computing: A Practical Guide

I’ve been in tech long enough to remember when scaling was a dirty word. Scaling a service meant lengthy delays. You had to buy new servers and physically insert them into limited rack space. Your team needed to calculate the additional power requirements to make sure your new server wouldn’t blow a fuse. You had to worry about cooling all that silicon. It felt terrible when a service outgrew the server where it resided.

Very thankfully, I haven’t had to think about that kind of scaling for more than a decade. Today, most companies choose to build their applications on cloud solutions like AWS or Google. However, questions of scaling never went away. I still check application dashboards with trepidation when a service outgrows the power I’ve allotted it. Thankfully, fixing the problem these days takes a matter of minutes, not months.

While scalability in cloud computing is easier today than it’s ever been, it’s still an important topic to understand fully. In this post, we’ll talk about scalability in cloud computing and how you can leverage the power of the cloud for your team.

A cloud being enlarged signifying scalability in cloud computing

How Does Cloud Scaling Work?

When I got my start in tech, all of our server technology came in a box. Literally, our servers looked like pizza boxes. We might order a server, and it would have a quad-core processor and 4GB of RAM and a 256GB spinning disk hard drive. I recognize that I’m dating myself here, but that was what our servers were like. Each of those servers would run one copy of Windows. In rare cases, we might be running Linux, but a lot of our critical applications ran on Windows. Installing a new server in the server room, which was in the basement of our office, might take a couple of days.

Whenever possible, we tried to keep a few copies of the server on hand. One would be used as a backup in case the hardware failed. Another would be used as a testing server. We didn’t have very many testing servers.

If our server ran out of hard drive space, or needed more memory, we had to order it and physically install it. That process usually took weeks. As a result, we worked hard to ensure that the servers we ordered would handle everything we needed them to do.

In today’s cloud platforms, literally none of that applies anymore.

Virtual Machines and Cloud Computing

Today’s servers are almost never collocated. Instead, they’re assembled by a company like Microsoft or Google or Amazon in a data center somewhere. That data center connects to a very, very large internet connection. Instead of each physical server running one operating system, users partition each server into a number of virtual machines. Virtual machines mean that one server might effectively host four or five different servers at the same time. In certain cases, one virtual machine can actually span multiple physical computers, as well.

Another important point to understand is that storage is no longer a physical part of the server. Instead, storage connects via the network to the server that needs it.

This unlocks something powerful. Say you’ve discovered that the quad core, 4GB of RAM server you’re using doesn’t cut it any more. You need two more CPU cores, and four more gigabytes of RAM. With a dedicated server, that means installing a whole new physical server. On the cloud, that process is much simpler.

Because the cloud provider has enormous resources, you can simply ask them for a new virtual machine. So, you tell the cloud provider about your new needs through a web UI. They spin up a new virtual machine, invisibly to you, and connect it to the networked storage. Finally, they tell the new virtual machine that it has the same name as the old one, and take the old virtual machine offline. Your server has all the power it needs, now.

Unlocking Scaling in Cloud Computing

So that’s it, right? You have unlimited power. You can make your server as powerful as you need for any task you have at hand.

Not quite. For starters, cloud computing services still have limits to how large a machine you can build. They might be massive, but there are limits.

More importantly, virtual machines grow more expensive the larger they are. Bigger cloud servers mean you’re shelling out cold, hard cash to keep your services running. Adding more cores, or memory, or faster storage is what’s known as vertical scaling. To avoid the costs and limitations of vertical scaling, many teams will choose to pursue a horizontal scaling strategy instead.

How Can You Scale Horizontally?

Scaling horizontally means that instead of adding increasingly expensive resources to a virtual machine, you can instead add additional cheap virtual machines to your cloud. What if, instead of adding more cores and more RAM to your server like before, you could split the work across multiple virtual machines? Maybe you don’t need six CPU cores and 8GB of RAM on a single server. Instead, you might be able to do the same work with three of the much-cheaper two-core/2GB virtual machines. If you can make your application work across multiple servers, you’ll be saving your company money every month.

Web servers are a classic example of work that scales across servers. Most web applications today are made up of a few web servers sitting behind a load balancer. Each server is able to handle a few requests, and the load balancer ensures that each server gets roughly the same amount of work. If your site experiences a traffic spike, you need just spin up another web server to handle the extra load.

Horizontal Scaling at the Extreme

At the most extreme ends, companies will break up entire applications like they break up their web servers. This kind of architecture, called a microservice architecture, means that every part of an application has its own virtual machine(s). If one part of the application is more expensive in terms of computation time, new virtual machines come online to handle only that little bit of work. Advanced shops will use DevOps tools like Kubernetes to orchestrate all of these virtual machines so that they work as expected.

How Much Scaling Do You Need?

For many organizations, scaling quantity is a million-dollar question, very literally. They spend a lot of time and effort figuring out how much computing power they currently need, and how much they are going to need. To figure out how much they will need in the future, they turn to tools like scalability testing. That kind of testing shines a light on which parts of their application are most likely to fail in the event they experience a sudden spike in usage.

Another way that organizations learn how much scaling they need is by effectively monitoring their current systems. Tools like Scalyr’s Dashboard provide real-time insights into how their applications are behaving. When they can see how many resources they’re currently using, it’s easy to understand which virtual machines need help. Whether that help means scaling horizontally or vertically, the key is having the information about how things are going.

Scaling a server today is much easier than it’s ever been. I’m thankful for that. I know my job is a lot easier today because of the power that scalability in cloud computing unlocks. What can that kind of power do for your company?