Vertical and Horizontal Scaling


Hello! This article discusses the two types of scaling found in infrastructure. These concepts aren’t extra-mysterious or anything; there are some subtleties to them that can trip you up.

Units of Work and Things

I’m going to explain this in a very generic fashion, because I want to empathsize that these considerations are not limited to just servers, but to application architectures as well.


Let’s say you have a Thing; you’re an early adopter, so you buy the first version, Thing 1.0. A Thing can do some units of Work. It doesn’t matter what this work is. Thing 1.0 can do 10 units of Work per hour. You only need 6 units of Work per hour right now, so you are fine with having just 1 Thing 1.0.

Six Months Later

Your company has grown to the point that it now needs 10 units per hour of Work. This is often where one of two choices is made:

  1. Upgrade Thing 1.0 to Thing 2.0. Thing 2.0 can handle 20 units per hour of Work. Amazing!
  2. Add another Thing 1.0, so that now we have 2 Thing 1.0s.

Option 1 is vertical scaling. Option 2 is horizontal scaling.


I think in flowcharts a lot. This is handy for decisions like this, and also super great for relationships. So below are the rules I try to observe when deciding how to scale a system.


There is a fundamental ceiling on vertical scaling. A Thing 1.0 is comprised of the best parts available when it was built. Technology moved on, and newer, shinier parts were available for Thing 2.0. Can we swap out some parts in Thing 1.0 with newer, better, shinier parts?

Why yes, yes we can. Usually.

In the context of a server, this usually amounts to increasing one of four resources, presented below in rough descending order of priority:

  1. RAM
  2. Disk
  3. CPU
  4. Network

We could maybe do this for years, but eventually the newest technology will not be compatible with the old Thing 1.0. A benefit of this type of scaling is that it is easy to do and requires no changes to application architectures.

You can do a lot with something like a Postgres DB with a lot of RAM and fast disk. Many small-to-medium businesses will never outgrow their Thing 1.0.

Let’s take a quick look at[]. They have a machine type, the x1e.32xlarge that has the following specs:

RAM: 3904.0 GiB
CPU: 128 vCPUs
Storage: 3840 GiB (2 * 1920 GiB SSD)
Network Speed: 25 Gigabit
Cost: $26.688000 hourly

Expensive, but that’s a lot of vertical capacity!


If an application is not properly designed to make use of multiple CPUs and cores, then it may not see any benefit from scaling the host server.

This is something you see in many legacy applications that were written back when the thought of a machine with 128 cores was hilarious. Once you have a monolithic system that is tightly interconnected, it may not be worth the cost to try to pull it apart.

Scale the Right Component

Cloud providers love it when you just click that little old “Change Instance Type” button and pick the next one up. Before doing this, have a way to measure what success is, try different configurations, and so on. With most cloud providers, you can scale the individual components: CPU, RAM, disk, or network are the most common. This is helpful if you’ve narrowed the exact bottleneck.


This sort of scaling doesn’t have a ceiling like vertical scaling. But now instead of 1 Thing we have 2, and they may have to coordinate between themselves.

State is the Enemy of Horizontal Scaling!

What is State?

On the surface, state is simple to understand. It is the condition an entity is in at some point in time. If you buy a new computer, it will come with some operating system installed, but nothing you have installed. We can consider this the pristine or untouched state of the computer.

Then you start installing all your favorite apps and changing (mutating is another word commonly used to describe this) the state of your computer. Over the years, it accumulates more and more state, until one day you erase it and re-install the OS, reverting its state.

If the computer had a completely empty hard drive, it could be considered to have no state.

A Simple Webserver

Let’s say you have a simple webserver that serves only .html files. When someone requests one, the flow looks something like:

  1. Client initiates connection to server
  2. Server now has some amount of state associated with that client (its IP address, browser info, etc)
  3. Server serves the requested file
  4. Connection is closed, and everything that server knew about that client, its state, is gone

An Example of State

Let’s say you have all your customer info is in one database (thus all your state), and you have a separate web application server that talks to it. You want to scale it horizontally. Some immediate questions are:

  1. How do you decide to split the data up between the two machines?
  2. How does the web application know which database server to query for a specific customer?
  3. Are the two database servers aware of each other?

State can even creep in to your application in less obvious ways. If an application uses HTTP sessions that are long-lived, state begins to accumulate there.

In contrast, a load balancer that just accepts a request and hands it off to a backend has almost no persistent state. Thus if you need more load balancer capacity, its easy to just add another.

Horizontal Scaling: Not Just for Capacity!

Horizontal scaling is not always only about capacity. If you only have one database server that you’ve scaled vertically, and that one database server fails, then all of your data is inaccessible until it is repaired, a backup is restored, or somesuch.

Unit of Scalability

When architecting a system, one thing I think about is the unit(s) of scalability for the system. If we are serving web content, and we know that instance type X in cloud provider Y can handle Z requests per second, knowing this lets us do two things:

  1. Predictability and Forecasting. If the marketing team is planning a campaign that they say will bring in a billion new users, we can get a rough idea of how many additional servers we’ll need. In turn, we can predict the cost.
  2. It allows you to focus automation efforts on building and deploying self-contained units of scalability.


There are a lot of similarities between building scalable infrastructure and scalable applications. The most reliable systems and applications I’ve seen have all had one thing in common: they were built from repeatable, composable, self-contained units.

That’s it for this article!

If you need some assistance with any of the topics in the tutorials, or just devops and application development in general, we offer consulting services. Check it out over here or click Services along the top.