Finding Patterns in Server Resource Data

I've always been fascinated by how fast IT systems administration has progressed over the decades.  Yet we haven't progressed much in developing scientific methods for analyzing our server resources.  We routinely collect resource utilization data, such as CPU and memory utilization but we haven't changed how we approach analyzing that data.  Many IT engineers only look at the data from a two dimension view.  We're totally missing other dimensions which other disciplines have long since conquered.  We look at server utilization data, from different time windows and make assumptions based on views at that point in time.  Very similar to the poplar analogy of a 3 dimensional sphere passing through a 2 dimensional plane of existence.  What I would see is a circle, starting as a point grow then shrink until it disappears.  I can make no connection to the sphere simply by looking at just one circle.  I need to collect all the data and assemble all the slices to model the sphere.

From my undergrad days,  we would routinely employed FFT (Fast Fourier Transforms) on sets of data to determine frequency and magnitude of recurring occurrences.   Chemists, Statisticians, Physicists and yes even the folks developing server electronic components, all use FFT to analyze data.  IT systems administration is still in its infancy and if it is to progress, its implementers will also need to utilize such tools.

  I've recently started investigating utilizing FFT to analyze server performance data in the hopes that it will expose information not seen through traditional means.

In the example below, I've plotted server CPU utilization (over one week).  When plotting % CPU utilization vs Time I get the following traditional plot:

time function.jpg

            Figure 1: Standard time function of %CPU utilization over one week. The data below was collected at a frequency of once every 15min.

Nothing spectacular.  I do see a few peaks at specific times but I really can't establish any pattern.  However after I ran it through a FFT.  A clear pattern emerged.

I super imposed two plots, taken from two different periods of time, and found that there are some common occurrences around 0.2 (or 3min), 0.35 (~5min), 0.5 (~7min) and 0.65 (10min).  The purpose of the plot below is only to visualize what this type of plot would look like.


       Figure 2: FFT of %CPU utilization over one week.  Two images superimposed. The data below was collected at a frequency of once every 15min.

At the moment I am not quite sure what this means, but I expect that this will correlate lovely to the scheduled tasks running on the server.  What is clear is that more work needs to be done in this front to identify its application and how to best utilize this information.

I'll continue to update my blog as I have more to share.