Horizontal Scalability in Your Software: The What, Why, and How

If you find yourself with scaling issues or bottlenecks in your application, take a moment to congratulate yourself! That means you have enough customers to experience some growing pains.

Now, once you’ve finished celebrating, you’ll want to figure out how you can scale your application so it can continue to grow. In this post, we’ll cover horizontal scalability, why you may want to consider it for your application, and how you can prepare your application.

First, let’s look at what horizontal scaling application means.

horizontal scalability shown by hoizontal arrow across circle

The What

In the world of software, scaling involves increasing the available resources for an application. We typically do this to get around a resource bottleneck. And when we talk about resources around scalability, we refer to CPU, memory, network I/O, and disk I/O. All of these can create bottlenecks for your application that result in degraded performance.

To remove or mitigate the bottleneck, we can scale our applications either vertically (scale up) or horizontally (scale out). For vertical scaling, we increase the resources on one instance, VM, or server. Alternatively, with horizontal scaling, we increase the number of containers, VMs, or servers for our application, multiplying the instances available.

If you want a different way to picture horizontal and vertical scaling, consider if you’d rather face 100 duck-sized horses or one horse-sized duck. The one horse-sized duck has scaled vertically, increasing its size to deal with whatever it may face alone. On the other hand, one hundred duck-sized horses have scaled horizontally. They don’t need to be individually strong or powerful as they work together in cooperation.

And depending on your application needs, you should consider what’s best. Typically this involves a bit of both horizontal and vertical scaling.

Let’s look at both of these a little bit more.

Vertical Scaling

First, with vertical scalability, we increase the number of resources available to our server, VM, or container. For example, if your application container starts with one gig of memory, we can scale vertically by increasing that to two or four gigs. Or if we run low on CPUs, we can increase our two cores to four cores.

Vertical scaling provides benefits when processes in your application take up a lot of resources. If you have something that requires a lot of CPUs or memory to run, vertical scaling will provide the power you need. If you have a legacy app that does some heavy lifting in terms of CPU and memory, scaling your application vertically may be your only option, as some applications weren’t designed to be scaled horizontally.

Unfortunately, typically we end up having to restart or bounce the server or VM to add the additional capacity. That may not be desirable for your customers, especially if you’re already running low on resources. Restarting at that time may cause a lot of issues for the end users.

Additionally, we can run out of vertical scaling available on our platform. For example, your VM has limits to the amount of memory or CPUs you have and can add. Eventually, you’re going to hit a wall.

Horizontal Scaling

With horizontal scalability, we scale by increasing the number of servers, VMs, or containers running our application. Here we replicate our application so that multiple instances run simultaneously. Then the processing load spreads out over the instances available.

In the past, people tended to shy away from horizontal scalability, as it could significantly increase cost and overhead. If your application runs on bare metal (directly on your server) or on a VM, you end up having to deal with extra overhead for each additional server or virtual machine. However, with containers providing lightweight shells to run your application in, it’s an increasingly affordable and often preferable option over scaling vertically.

Vertical vs. Horizontal Scaling: A Summary

  Vertical Scalability Horizontal Scalability
Definition Adding more resources (memory, CPU power, etc) to your existing server, VM, container. Obtaining more servers, VM, containers.
Advantages Reduced software cost, since it doesn’t require parallelization. Easier implementation. Reduced risks of downtime. Implementation cost might be lower.
Disadvantages Bigger risks of downtime. It might have a higher total cost of implementation. Limited scope of scalability. More complex architectural design. The process of sharing data is more complicated and more costly.

Vertical vs. Horizontal Scaling: A Verdict?

So, we’ve just summarized the definitions, pros, and cons of horizontal and vertical scaling. Do we have a clear winner?

No, there’s no way to pick a winner here. As we’ve alluded to before—and will allude to again in the next section—in many scenarios you’ll need to mix the two approaches. The current trends suggest an increased adoption of horizontal scaling in the future. As you’ll see in the next section, horizontal scalability can bring very attractive benefits to the table—especially regarding redundancy and less risk of downtime—to the point you might want to forgo vertical scaling entirely.

However, horizontal scaling might present its own set of challenges—for instance, you might run into performance issues. For the time being, then, most organizations will probably benefit from leveraging both approaches, as they migrate to exclusively using horizontal scaling.

The Why

As this post focuses on horizontal scalability, we’re going to review a few reasons why your application may require horizontal scalability.

  1. Customers require little to no downtime. As we mentioned previously, scaling vertically can require restarts when adding the additional capacity. On the other hand, adding additional instances of your application won’t need a reboot. Your load balancer will begin routing some of the traffic to your new instances once they’re up and running.
  2. You want to provide resiliency through service redundancy. If one instance of your application fails, your other instances can pick up the additional traffic so that your application availability doesn’t decrease significantly.
  3. The load on your application varies. If your application typically only needs minimal resources but then spikes a few times a day or week, you can rely on automatic horizontal scaling to smooth out the journey for your customers.
  4. You’ve hit vertical scaling limits. If you’re hitting limits on what your containers or VMs can provide, you may want to consider scaling your application horizontally across multiple VMs.

And even though these reasons may compel you to use nothing but horizontal scalability, remember that you can use both. When running into performance or throughput issues, consider both types of scalability and see how they affect your application metrics.

The How

Now that we know what horizontal scalability is and why we should consider it, let’s discuss how to use it properly.

It all starts at the drawing board when we design or, in some cases, redesign our applications.

Application Design

The following design considerations make horizontal scaling simpler and more manageable.

  1. Make your horizontally scaled services stateless. With multiple instances of your application running, you can’t guarantee that a customer request will hit the same instance every time.
  2. Use microservices where appropriate. Typically microservice design involves splitting services between different business functions. In addition to that, you should consider splitting services based on resource requirements. You can separate processes that are resource-intensive from those that aren’t. Simple CRUD operations may belong in a different service than one that uses the underlying data in complex and computationally heavy algorithms.
  3. Consider multi-cloud tolerant applications. Some companies have found that hosting apps in their own data center provides cost and support benefits. But they’re limited by the number of physical servers they maintain. If our applications can run anywhere, nothing stops us from scaling into the cloud. So your on-premise servers can handle your application load most of the time, but times of peak load, like a holiday shopping season, or other temporary increases in load can be scaled out into the cloud when necessary.
  4. Design your application for the platform that your customers use. If you’re building a mobile app, you will have different bottlenecks than a web app. Keep that in mind when determining where and how you may be able to scale your application.

The sooner you consider these elements, the better it will make future scaling decisions and automation simpler.


Now that we’ve designed our application for horizontal scaling, let’s look at how we determine what to scale. Typically, when we hit constraints in our systems, they fall into the four following categories:

  1. Disk I/O
  2. Network I/O
  3. Memory
  4. CPU

All of these combine to let us know the saturation level of our application. But without proper metrics, we may assume we have a problem with memory when the problem actually involves network I/O.

In addition to system-level metrics, we should look at metrics that affect the customer. Track metrics like latency, errors, and transactions per second. See if you can find a correlation between these metrics and the resources above. That will help to determine what you want to scale and to identify bottlenecks. You can even use Scalyr to start tracking these on your application.


When dealing with containers, both horizontal and vertical scaling can kick off automatically, so you don’t have to worry about manual intervention. But you’ll want to configure it properly. For example, if you have your containers scale horizontally when your traffic increases to a certain point, you may not prevent degradation from certain transactions that take up more resources than others. So you’ll want to consider various metrics when deciding your scaling rules.


Now we’ve designed our app correctly, added metrics to find bottlenecks, and automated our scaling rules. What’s next? Validation.

Through scalability testing, you will find out if your scaling plan will work. As with automated tests, until you’ve seen your system handle the increasing load with scaling, you don’t know for sure. So validate your setup, and you can feel more confident in your decisions.

The When

To wrap things up, I’ve added one more section: the when. When should you start all this? Should you wait for a problem to occur before you make changes? No. You want to make some progress before your application performance becomes a problem.

If you’re just starting to build your app, look at the design first. Look for ways in which you can architecture your application for scale. Also, it’s important to make projections into the short, medium, and long term future. How many users do you expect to have 6 months after launch? And after 12 months? Two years?  Try to estimate how much data will be processed at each step of the userbase growth. Finally, have plans in place to handle userbase growth both slower and faster than your estimates. 

What if you already have an application up and running? Start with metrics. Validate that you know what’s going on with your application because without metrics to help identify bottlenecks or scenarios that result in service degradation, your scaling efforts may not be optimal.

Finally, verify it through proper testing. There are many valuable types of performance testing. You can use them to find out the necessity to scale and the ability of your app to do so. Implement techniques such as load testing, spike testing, and scalability testing. With their feedback, you can make informed choices when it comes to scaling. 

Scalyr is a tool that can offer you full observability into your application. It might be the tool you need to understand the scalability needs of your app. And now’s the perfect time to try it for free.