What size server is the right sized server?

"We’re moving the RISC based applications to Xeon based servers.  What size server do I need?"  This is usually the 1st question and is the right question - But at the wrong time.  By the time it’s the right time, you’ll have the answer well in hand.   I’m going to explore the answer here and in future posts. measuring and sizing your server needs
Image attribution: Flickr User: aussiegall

My response is ‘Why are you sizing and buying the end state server when you haven’t even started the migration process?  First you need to see how well your application(s) fit on the target Xeon based server.’  As most shops do, you run a Pilot or POC to determine feasibility.

This and the next few blogs will discuss this process of getting to the right sized server for the pilot or POC.  Previously I discussed sizing tools.

I'll focus on sizing your future server based on current performance and anticipated growth.  Tools exist for precise measuring of your performance.  They have a cost and are sometimes wrapped in to consulting engagements.  Instead here I’m looking at the tools for a Do-It-Yourself performance monitoring and eventual sizing predictions.  And remember, we’re just looking at the size of a server for the test platform.   No need for exact precision here.

These tools are good for coming up with the target server for the testing.  Testing will prove to be the most accurate method to come up with the correct sizing.  But how do you determine what to use for the testing?

You want to look at:

  • CPU
  • Memory
  • I/O
  • Network

There are tools available in the operating system or you may need to add the Sysstat package of performance monitoring tools which provides:

The  system activity reporter (SAR) tool monitors the performance data back a number of days or weeks on 20 minute intervals. (Depending on how sar was configured.)  The other tools are for measuring performance over shorter terms.   With the other tools, (i.e. vmstat, iostat, netstat, mpstat) they start monitoring as soon as you start them up.  For instance the command vmstat 1 1 will have vmstat output data each second for 1 minute.  This is not enough time to get a profile.  You can set these tools up to run for a long time but they’ll generate a ton of data.  To make sure they continue running after you log off use nohup before the xxstat command and pipe the output to a file.  You should be able to load this data into a spreadsheet for graphing.

With this data there are a series of questions you are looking to answer.

  • How many users are on the machine concurrently?  This isn’t the total number of named users or potential users but the estimated users that are on at the same time.  What are the resources used by each user?
  • How much memory is being used by the applications without any users?           
    • Use the SAR data to get this over a longer period of time.
    • What is the current load on the CPU at peak usages?             
      • Measure over a business cycle.  This can be a month, a quarter or rarely, a year.
      • Use Capacity planning tools to get the data or use the SAR data.                 
        • Tune SAR to sample more frequently than every 20 minutes but be prepared for a LOT of data.  Also have SAR save data for the month rather than delete it.

(Let’s assume that the observed CPU wait states or queues will be handled by the new server.)

  • For core by core data you can use MPstat
  • What I/O bottlenecks are there?  The sar data or iostat will, also show you the I/O queues if your SAN or NAS management tool doesn’t have it.
  • What network bottlenecks exist?  To get network data use netstat data or again SAR.

If you’re daring this can be pretty simple.  Check out the benchmark data on your existing SPARC or POWER system.  Use TPC.org or SPEC.org to find the system.  Look at the same data for the Intel Xeon system you would use for testing.    For instance look at the benchmark for the Intel Xeon Processor E7 family.

So, if the SPARC server you are replacing is rated at a SPECint of (SWAG here) 45 and the Xeon based system you are buying to replace it is rated by SPECint at 253; I don’t believe you need to be worried that you will have a problem with this Xeon based system as a test platform.

So now you have an approximation of what hardware you need to migrate from the old RISC system to Xeon based systems.  The next step is the POC or Pilot and the planning for that process is the subject of my next blog.

I would love to hear what you think of this.  I look forward to your comments.