RISC to IA Migration – Finishing the PoC and Sizing the Target Server

finish_line.jpg In my previous blog I described the methodology to run a Proof of Concept when migrating a production database that underlies a mission critical application.  This blog covers the analysis of the results of the Proof of Concept.  In other words, you’ve done the PoC, now what?
Flickr Image: Pat Guiney

The Proof of Concept has three primary objectives:

  • To ‘prove’ that the application can run in the new environment
  • To see how the application performs in the new environment
  • To specify the steps necessary to conduct the actual migration.

The proof that the application can run in the new environment seems pretty straight forward.  Sure, if it runs after being ported, it runs.  Check that one off the list.  However, when we start the PoC, there’s no guarantee that it actually will run.  We use the UAT procedures - be it a regression test harness or a select team of users - to bang at the application and ensure that we have ported and hooked up everything.

As mentioned before, these tests are run frequently and usually after ‘flashing back’ the application to the initial starting point.   Once this is done, all of the components and set up steps need to be carefully documented.  These steps will be repeated in the subsequent phases of the entire migration.

The steps taken for the proof of concept can be seen as covering a continuum.  In some cases, the steps will need to be repeated in each phase, such as porting the data.  Other steps, once done for the proof of concept, like rewriting shell scripts or editing the C++ code will not need to be repeated.  Once these ports are done to the satisfaction of the customer, then that is all that needs to be done and these can be put aside for use in each of the subsequent phases.   Other steps fall in between.  These steps are setting environmental parameters that need to be set for this application.  For each subsequent phase, we need to reset the parameters but we don’t need to determine how to set them.  Documenting these settings is all that needs to be done.

So now that we know that the application will run, we then look at how it runs in the environment.   This is where environmental tuning parameters are adjusted.  This is where code may need to be rewritten.

Some proof of concepts start off with the application performing on the new target significantly worse than in the original environment.    An application can perform differently when hosted on a different platform.  For instance, an application that was CPU bound on the old platform could start having I/O queues or memory usage issues.    If you’ve been measuring application performance by looking at queues on the original host you’ll likely miss this.  The only queues you really should be looking at are the queues (CPU, I/O, and Network) on the PoC hardware which simulates the eventual target hardware.

I once got an application running that was ported from a mainframe into a relational database.  Performance was terrible even though on paper the new environment would significantly outperform the old host.  We looked at tuning parameters and we were optimal.  We looked at performance reports from the operating system and it was OK.  We looked at the performance reports from the database and sure enough, CPUs were barely running but the I/O to disk was pegged.  The customer was looking to us and the application vendor for a fix.  I looked at the code and found that the application vendor had written VSAM style data access methods in SQL!  In other words, instead of using relational set theory to winnow the data, his application read each row in sequence through the entire table.  The PoC stopped right there, and the customer kicked the application vendor out.

Like in the story, we need to observe carefully how the application is performing.  We can use the output from the testing harness to give us data on how long each tested task took and compare that to the SLA’s for the application.  We can look at the tools built into the operating systems, like Perfmon for Windows or the ‘stat’ tools, like vmstat, iostat, etc., in Linux.   (Let’s not neglect the application as the primary source of performance problems but we’ll discuss that later.)

The data from the tools needs to be analyzed.  The performance times measured by the test harness are pretty obvious as it usually reports response time.  If the task took more than a second to complete but requires sub-second response time, then we know there is a problem that needs to be looked into.  Is the application just running poorly?  Is the hardware inadequate?  Is the operating system or underlying software in need of tuning?  Perhaps the application architecture needs adjusting.  Our performance monitoring tools will provide us clues.  We’re looking for bottlenecks.  Bottlenecks can be found where data travels, like I/O or in the case of a database, in logging the database changes.  Look for queues in network and storage I/O and queues for the CPUs at the operating system level.   At the database level, we are looking where ‘waits’ are occurring.  For instance, in Oracle we are first looking at waits in the single threaded operation of writing to the log files as well as anything affecting the logging process.

Now we can more precisely determine the capacity requirements of the target platform.   Here we can project with greater confidence the characteristics of the platform we will be porting to.  For an application requiring bare metal this is critical; for an application that can be hosted in a Virtual Machine this process defines the initial set up requirements.  Remember this new platform will need to handle the projected growth and handle the unexpected black swan.

Now the PoC is over.  The results have been converted to foils to be presented to management for planning the conversion.  The documentation we wrote is now the recipe for the next step, the migration rehearsal.