Cloud Computing & Capacity Planning for SaaS – Part II

As previously discussed in my last post about Capacity Planning for SaaS, I explained how to understand a user’s behavior, and translate that behavior into mathematic language. This same principle can be applied to web application as well.


Analyze the Application

The second aspect in capacity planning is to understand application behavior under load. It is important to recognize how many computational resources will be required to execute a single “transaction”. To do it, you will need to setup a lab or use a homologation environment where you can install the application in an equivalent environment (i.e. smaller scale). With this, you can simulate user interaction on application with different load levels. There are several options available in the market for load testers. However, for an ad hoc test, I personally like the Microsoft Web Application Stress Tool (aka. WAST).

As soon you have the lab ready, you can start the load test. Whichever tool you decide to use, you will get a test report that shows how many requisitions there are, how many succeed, how long it takes for those to suceed, how many failed, etc. The most relevant metrics (considering a regular Microsoft IIS) are:

  • Web Services: Get Requests/sec
  • Web Services: Post Request/sec
  • System: % Total processor time
  • Active Server Pages: Request/sec

A fine tune in the load is important so that you can represent the user behavior. This should include latency between the requests that simulate a real user who navigates the website. At the end, you will reach the point where you do not get HTTP errors. These errors mean that something went wrong, such as the loader tester provides more requests than the website is able to manage, or application issues. Usually, I try it several times until the point where the HTTP Error 500 does not occur, but with a little more load it appears. Even if you identify that a target system has room under the load of the CPU, memory, I/O and network bandwidth, this does not mean that the system is able to handle heavier loads. At this point, you should stop, and start to work on tuning.

Let me give an example of why you may experience problems that relate to user demand with hardware capacity utilization. By default, IIS can allocate up to 25 threads per CPU core. If the web application requires a remote call for any external component (such as a database, middleware, logs, etc.) this thread will allocate until the external component call returns. This round-trip takes time and does not consume CPU. In situations like this, there are many strategies to minimize this latency impact, such as increase thread count per core, make faster calls, etc.

With a test report in hand, how can we interpret it? Let’s assume that in a hypothetical test, we got these numbers:

  • System: % Total processor time: 92% average during the load with few peaks of 100%. It’s enough to admit that there is no bottleneck with CPU.
  • Active Server Pages: Request/sec: 152, and a transaction is composed of 8 ASP requests (i.e. user opens the first page, selects a couple of options that call ASP pages, etc.), and in this case we discovered that from application standpoint, the system is able to handle in the lab hardware about 19 transaction per second (i.e. 152/8).

Calculating the response time

The Little Theorem (aka. LT) allows you to calculate the capacity to attend requests in a network based on user’s request rate over request process capability. This means that if the system reaches the point where the reason is 1 or greater, then the system is overcapacity, requests have been dropped and some users see the HTTP Error 500 in their web browser or wait in the queue to be processed.

The following equations represent the amount of requests that can wait in the queue, and how much time you should wait until it can be processed.


Apply the Little Theorem (λ=rate requests from Poisson, μ=capability to process requests, p=).

At this point, Poisson’s probability becomes very useful. You can select the highest rate probability that multiple users access the system at same time, and its highest point in the Poisson curve. With this data, we can forecast not only how many users/transaction we can deal with but also the maximum time required for it to be processed.

Defining the Environment:

At this point, you know enough about the system and how it works under load, which leads us to the follow table and graphic:



As you know from my previous post on cloud computing, the probability to maintain 20 users or more is less than 2%. However, it exists, and so work on the code review or improvement of the computational capacity could be required.

Best Regards!