Recently, I've helped to define some transformation rules for the raw data as collected by our storage management system (see my previous post).
With this system, we collect tens of thousands of "facts" about our storage per day. These facts have to be analyzed in some way to support business-specific decisions. They can't be processed on-demand every time, as it would result in significant delay for the analyst to get the report.
To make it successful:
- Someone has to understand the business topic (storage, in our case), to define what type of decisions should be made. For example: should we order new storage capacity based on current utilization, staleness and trend?
- Someone has to understand the limitations and capabilities of the existing collecting system - what types of data are collected, how and how often are they collected, how data quality can be assessed, etc. For example, the data is collected once a day, usage is collected in megabytes, per each filesystem.
- And now one has to define how to bridge the gap between these two views within our BI infrastructure. The analysis is not always straightforward. The transformation rules should result in significant reduction of the original data set - but this reduction should not be too drastic. What details are not related to the specific business question and can be dropped?..
Now, as there are always multiple business questions which could be asked - should the transformation model strive to be as extensive and as complicated as possible to address all of them, including the ones to be defined in the future? Or, it would be easier to maintain multiple separate transformation models - one per question? In the latter case, it may result in extra transformation resources - but the actual reports would be much faster. And, ideally - how one would be able to provide "ad hoc" query capability on the data set as granular as possible, which would still complete in some reasonable period of time?
Unfortunately, it's not always possible to find someone who can equally represent all the views above - in this case, transformation rules definition may become extremely painful, and require too many iterations.
I wonder if others have similar experience - do you?