I have been advising a local entrepreneur who is building a really interesting new web play. A great guy, but doesn’t have a deep background in technology. He is starting to see some traction with his service, and is beginning to run into those early scalability hurdles that so many young startups eventually run into.
Our informal discussions around scalability inspired me to jot down some of my thoughts on this issue, and how early-stage entrepreneurs can scale their technology platform from 5 users to millions.
Some Simple Rules
There rarely exists a set of rules that, if followed, will result in nirvana – scalability is no different. Every situation is different. However, these bullets summarize my general tenets, mantras, and beliefs for scaling a web-based application or system. The rest of this post will cover these in a bit more detail.
What is scaling (or scalability)?
Simply put, scalability refers to an system’s ability to handle increasingly heavier loads from users (activity) without fundamentally breaking the way in which it operates. In other words, as you continue to add new users and expand your business, you want your application or service to be able to easily handle the increase without slowing down, or worse, breaking down completely.
An application (or platform) is considered scalable if it can continue to service additional users, through the deployment of supplemental hardware/software/resources, without seeing any significant performance hit from the user’s standpoint. Of course, very few systems in their early prototypical states will fit this definition. Your goal is to get the product to the point where it can be scaled in this manner, while reducing the number of potential bottlenecks.
Unfortunately, for business and IT managers, there is no single way to scale an application. Ah, if only there were actually a big red “easy” button. Each situation is different, given that there are so many factors that need to be taken into account. To make matters worse, sometimes, an application seems “infinitely scalable”, only to have a major bottleneck reveal itself down the road. This doesn’t mean the end of the world – it simply means that you have to adjust accordingly. The trick is, of course, to reduce the number of “midcourse corrections” that you will have to endure.
A Bit About n-Tier Architectures
Before we dive in too deep, I should probably throw out a note or two on n-Tier architectures. If you are an IT weenie, and understand this concept, skip to the next section. Otherwise, hang with me.
In the old days, applications were deployed onto servers, and when a bottleneck was encountered, the physical resources in the machines were expanded. This was the extent of scalability. Then, some bright engineer realized that if you split a system into two “tiers”, you could distribute the workload a bit. Voila – the birth of the client/server revolution. Eventually, though, systems began to grow so large that they needed something else in order to break through the inherent bottlenecks of a 2-tier system. An even brighter engineer realized that there was no reason to stop at “2” tiers. You could have an arbitrary number of tiers in your system. Thus, the birth of the “n-Tier” architecture (“n” representing some arbitrary number of tiers).
An “n-tier” application architecture is characterized by the functional decomposition of applications, service components, and their distributed deployment. By breaking a system down in such a manner, it provides for improved scalability, availability, manageability, and good resource utilization. A “tier” itself is nothing more than a functionally separated hardware and software component that performs a specific function. Whew – what a mouthful.
Typical “tiers” include:
Note: don’t confuse these “tiers” with application “layers” (presentation layer, data layer, business logic layer, etc.) Tiers are architectural in nature, whereas “layers” are generally code/library specific.
The important thing to know here is that in an “n-tier” model, a system has been broken up into various levels of functionality, each capable of some degree of horizontal scaling. Which brings me to my next point …
Horizontal vs. Vertical Scaling
When you hear people talk about “scaling horizontally” they are essentially referring to the ability to add new servers to a tier to allow it continue to provide uninterrupted service in the face of continuously increasing usage. For example, your database is chugging hard and heavy, so you can add new servers to the database tier to distribute the workload. If your web server is bogged down, you can add new web servers to do the same. This also affords you with a nice layer of failover as well. If one server experiences an issue (even to the point where it crashes), you have other servers online in that particular tier that can still provide service. This concept is becoming increasingly important as more and more systems are being deployed using SOA models (service oriented architectures).
Vertical scaling, on the other hand, is where you extend/expand the physical resources in a server itself. For instance, your database server is getting way overworked, hitting the swap space early and often. You can “vertically scale” that server by simply adding more RAM, faster hard disks, better CPUs, etc.
There are benefits/pros/cons to each type of scaling. Obviously there is a cost associated with both. Horizontal scaling is theoretically infinite, whereas vertical scaling has an obvious ceiling (there is only so much horsepower that you can derive from a single server).
Horizontal scaling only makes sense if the service you are attempting to scale was designed to be extended in this manner. For many third-party applications, such as a database server, this will be the case. Of course, if you are designing the software, you will want to take this into account as you build it.
Think of hardware as simply a vehicle (perhaps a bus) for your software, your real service. If the bus gets too crowded, you add another bus to the fleet. However, not all buses will go to the same destination, so those buses need to connect together in order to get information from point A to point B within your architecture. Voila, you have the meager beginnings of a service-oriented-architecture (SOA).
A Scalability Example
Let’s set the stage with a typical early-stage example, and we’ll try to scale this application theoretically as the business scales.
John is an aspiring technology entrepreneur who has developed a really great online service called WidgetFire. WidgetFire is brand new, so there aren’t many users yet. John has built this product in his spare time, and is bootstrapping the business via his day job as a software engineer for another company. To keep his costs down, he has a single web server (built from extra parts), and he has it co-located at a local data center with a basic level of service (1U single rack space, 1Mpbs throughput, unlimited bandwidth, for probably < $100/month). On this single server, he is running Apache and MySQL together, along with the normal services (bind/DNS, sendmail, etc.)
So far, scalability is not a concern to John. But that is about to change. In a big way.
Over a period of a few months, John leverages word-of-mouth marketing and manages to aggregate 25,000 registered users for his service. The server he is running (which he’s affectionately nicknamed “Seabiscuit”) is holding up fine, but is beginning to feel the strains of all of those new sessions and database queries. Additionally, his automated e-mail notification list is starting to add to the system load, as now the server is sending out thousands of emails a day.
John “vertically scales” his server by adding some additional RAM and performing some additional performance tuning to the database. This buys him time. But not much.
Then, WidgetFire gets a mention in a prominent tech blog, and the next thing John knows, he has 100,000 registered users for his service. His server is on its knees, and practically unresponsive. Sadly, the data center staff isn’t much help – after all, he is running under a pretty basic co-located hosting plan – and they have bigger problems to deal with than the occasionally unresponsive Seabiscuit.
So John then moves to a very rudimentary n-Tier architecture. He moves his MySQL database over to a separate server, which frees up resources on the old web server. Now, the system is humming along smoothly. But after a few months, WidgetFire is quite the rage on college campuses, and John must once again address how he is going to facilitate additional traffic.
He does some vertical scaling on the database server (adds new RAM, faster disks, etc.), but it isn’t enough. The new user signups are coming too fast and furious for his 2 server setup to handle.
This particular nexus is where many startups begin to experience some tough scalability issues. The “easy” scaling options have already been exhausted (vertical scaling on one server, splitting the database off into a separate server.)
Fortunately for John, his traction has caught the interest of a handful of investors, and he is able to secure a small round of outside capital.
To get to the next level, John implements a load balancing router, an extra web server, and an extra database server. He configures the load balancer to distribute incoming web requests evenly between his two web servers. Additionally, he configures the original database server to be a “master”, and the new database server to be a “slave” server. While all database write operations occur on the master server, John realizes that most web requests that require database access will be for “reads”, so the slave can offload some of that workload from the master. Voila – he has scaled even further!
A few more months go by, and John makes the cover of Wired Magazine. VCs are clamoring to pour their cash into WidgetFire. While John basks in this glory, he doesn’t realize that his little server farm is becoming overwhelmed by the sheer success of his venture. To make matters worse, while John is putting together a plan to scale even further, his slave database server crashes, leaving only the one original database server in operation. WidgetFire is basically dead in the water.
John brings the slave server back online, but he realizes that more must be done. He adds an extra web server to the web server tier, and adds an additional slave database server to the database tier. But he doesn’t stop there. John realizes that his actual application, which is comprised of a mish-mash of Java, PHP, and Perl, is utilizing the vast majority of CPU time on the web servers themselves. John decides to move from a 2-tier model to a 3-tier model by implementing an “application services” tier. He moves this code off of the web servers and onto several new servers.
John realizes pretty quickly that his original code design really wasn’t architected for this type of model. He has to spend a couple of months retrofitting his old code to fit into more of a “web services” model. He is now surrounded by burgeoning IT costs, hosting fees, and system complexity. All of a sudden, his “Google-ready” venture is giving him a headache, and isn’t much fun anymore.
Of course, had John anticipated the steepness of his growth curve ahead of time, he could have designed his system with it in mind, and avoided at least some of the headaches.
Can’t money be thrown at the problem?
Sure. To a point. All things cost money, of course, whether it be labor (people), hardware, or software. Most systems can initially be scaled to a sufficient level by simply adding more hardware, or expanding the resources within the server. But that only gets you so far in most cases – at some point the logical architecture of the system needs to have been designed with scalability in mind. If the architecture isn’t scalable, you are either going to hit a ceiling, or spend way too much money to scale it (and even then, there are no guarantees).
It isn’t a question of whether or not money can be thrown at the problem – it all costs money. The question is how much money are you going to have to spend to scale it? Obviously, there rarely exists an endless supply of capital that can be leveraged to solve a scalability problem – especially in the startup realm. Clearly, you want to be able to control your tech spend (which is a big part of your overall burn).
The bottom line is this: If you plan for scaling initially, instead of throwing a bunch of crap together and hoping it will get you to point “B”, the less money you are going to spend as you scale.
To properly address scalability, you have to take a holistic approach, and examine your architecture, your software components, and your hardware configurations. Then you have to deploy capital in an intelligent manner. Otherwise, you could end up like John in our fictitious example – sitting on a pile of code that really wasn’t designed to split up into services across an n-tier architecture. And that, my friends, represents a serious misuse of capital.
If you are scaling by adding new hardware – that is generally a good problem to have – that hopefully means your business is expanding. However, if you are having to rewrite large amounts of code in order to scale – you’ve likely made some very serious mistakes. Too many of those mistakes, and you’ll be dead in the water. You see the latter quite often in the startup world, again, as there is so much pressure to slap something together and get it out the door.
It all starts with the blueprint of your architecture. Note, I am not referring to your functionality matrix/map, your application requirements definition, etc. I am talking about your physical and logical architectures.
When I say physical architecture, I am referring to the physical hardware components that make up your network, application, etc. Things to consider:
When I refer to your logical architecture, I am referring to the way in which your various software components connect with and layer into one another. Things to consider:
If you are in startup mode it is very easy to fall into the trap of “getting started” – pushing code and charging up the hill. “Get-to-market” pressure from investors rarely helps. However, many such efforts are met with stiff resistance once the entrepreneur realizes that the “hill” he or she just conquered is actually but a small plateau on a mountain comprised of increasingly steeper slopes. It all starts with a good plan in place!
Having the right architecture in place doesn’t guarantee that you won’t have to eventually add more hardware or write more code. Actually, in the early stages, you may actually spend more money (especially if you are deploying web services on their own servers, etc.) However, the proper architecture allows you to get the most out of out that new hardware and software, as you will be plugging it into a framework that was built with it in mind.
A Note on System Services
Before you start diving off into creating a true services oriented architecture, do yourself a favor and split your system services off accordingly. Things like DNS services and e-mail should be move off of and away from your application’s production environment. I bring this up because more often than not, you find system services being co-located on production web servers.
It is hard to get a sense of an application’s true load on a physical machine when you have a million other things running on it. This becomes even more important if you are planning on using tools like Six Sigma to measure scaling and availability metrics, as you will need to do everything possible to remove potential causes of common/special cause variation.
The Importance of Performance Tuning
Another very important aspect of scaling is performance tuning. Performance tuning is essentially the art of tweaking and fine-tuning your various applications and services in order to maximize their operational efficiency. There are more ways to performance tune a system than you can possibly imagine. Some obvious examples would include:
Some not-so-obvious examples might include:
I have known people who have spent thousands on new hardware, only to realize later that their applications or server/OS were simply not optimized. if you don’t have the requisite skills in house to do performance tuning, then outsource this function immediately – it is that important.
Finally, I should mention that performance tuning should not be viewed as a one-time activity. You should routinely profile and tune your systems (both hardware and software).
Another thing to stay on top of is your service level agreements with your data center or hosting provider. There are fundamentally three or four things to keep in mind:
First, if you are bootstrapping a startup, and you are using a low-cost, shared server, you need to move to a dedicated server solution as soon as possible. Trust me.
Next, make sure that you have the ability to quickly secure additional rackspace and bandwidth as you need it. Having your data center tell you that you are going to have to suffer major downtime because they have to move your servers to a “bigger rack” is probably not a good thing.
Third, make sure that the connections from your servers to the net are burstable. When you have those huge traffic spikes because someone put a mention of your site on digg.com, Slashdot, etc., you’ll want to be able to handle the temporary increase in traffic (without necessarily incurring a large bandwidth bill).
Finally, make sure that your throughput is not capped, or if it is capped, make sure it is capped higher than you think you’ll need. Don’t confuse bandwidth with throughput. Bandwidth refers to how much data your connection can transfer over a period of time (e.g. 100 gigabytes per month, etc.) Throughput, on the other hand, refers to how much data can be flowing through your connection at any given time (e.g. 10Mbps). Think of throughput as being the thickness of your server’s pipe – obviously, a lot more can flow through a garden hose than a soda straw. Same analogy.
To illustrate the point on throughput; a few years back we had a system that began to be wickedly unresponsive. The CPU loads on the server were only about 35-50%, so it didn’t make a lot of sense. It turns out that the connection had been capped at 10Mbps. Anything over 10Mpbs at any point in time had to basically wait in the queue. This caused perceived “slowdowns” by users. Raising the throughput cap remedied the problem, of course.
On a final note, I want to voice my support for co-locating servers rather than using a full-service hosting provider. If you have the skillset in house to maintain the boxes, using co-location can save you some money, and give you more flexibility. Again, in startup mode, every penny counts. If you are in or near Atlanta, I highly recommend my friends down at Capital Internet. I’ve co-located and hosted servers with them for 6 years now, and it has been a very enjoyable, hassle-free partnership.
There are a lot of advanced topics that come up in discussions about scaling applications. Things like caching, high availability clustering, network storage (via SANs), and distributed networks. Obviously, those are very specific areas that are beyond the scope of this already ridiculously long blog post. Suffice it to say that there are some very advanced (and expensive) toys out there that can make scaling a lot easier. However, the vast majority of applications/services can be scaled to rather massive proportions if you simply follow the bullet points at the top of this post.