As a finance business intelligence architect within Intel IT, I am part of the team transforming Intel’s legacy supply chain into a modern “glass pipeline” to improve our decision making capabilities and business agility. To create this glass pipeline, we are building an integrated data platform (IDP) that integrates the SAP HANA* in-memory database with Cloudera Distribution of Hadoop*.
We are in the third year of our five-year SAP HANA supply chain data transformation., and have learned a number of important lessons along the way. Here are a few that I think can help other architects, analysts, and developers.
Optimization of Code Yields the Best Results
It’s tempting to assume that the code written for the legacy system will automatically run faster on SAP HANA, but this is not always the case. And we discovered that end users are often more interested in how long it takes to generate a report than whether the data behind the report is fresh or stale.
For example, when using the legacy enterprise data warehouse, we could generate a certain report in about 40 seconds. However, the report was based on four-hour-old data due to necessary preprocessing (extract and transform). With the new SAP HANA system, the report included near-real-time data, but originally took about seven minutes to run. The end users’ perception was that the new system was slower than the old one—they valued quick reports over seeing business updates in real time. When we optimized the code behind the report, we were able to generate it in only six seconds—AND it’s based on near-real-time data.
We found that replicating legacy database code is not beneficial. Therefore, we spend only about 20% of our time modeling—making the code deliver the correct data—and invest 70% of our time and effort on optimizing the performance of that code. We spend the other 10% on building reports. We found several useful database-view code optimization techniques for developing in SAP HANA:
|Optimization Goal||Optimization Technique|
|Logically separate the data by period so that the query is isolated to that data set.||Use partitioning with input parameters to prune both rows and columns.|
|Efficiently combine data by merging instead of joining.||Replace simple joins with union operations with aggregation when joining large similar tables.|
|Avoid wasting resources on fields that are not required.||Use left outer joins to enable column pruning.|
|Avoid slowing down code with large formulas.||Replace complex functions with multilevel filters.|
|Add display-only attributes to the star schema and use a modular approach.||Keep the data models simple; for example, avoid 20+ consecutive joins.|
|Examine the code’s execution plan to determine the optimum order.||Check the execution order of your filters.|
The Right Set of Skills Can Be Difficult to Find
Efficient use of SAP HANA requires developers who know all of the following: How to code in SAP HANA, the relevant business processes and data models, and the bigger integrated data environment. This combination of technical knowledge and business acumen enables developers to be ultra-productive in delivering business value. Unfortunately, finding someone with all of these skills is like finding a unicorn.
Originally, we tried creating our own unicorns by training data modelers to code, but this was largely unsuccessful. We found that good coding is an art and can’t be readily taught. The data modelers created correct data output, but the time required to perform the analysis and generate the corresponding report was incredibly slow.
Next, we tried teaching developers about the data. This was slightly more successful, but even the best developers usually had little business knowledge. The result was better performing code, but there were bugs in the data and the time to perform the analysis was still too slow.
A modular architecture with reusable components created the best results. The architect creates the building blocks. Then the architect works with a data modeler to create a trusted, verifiable data set and with the developer to optimize each block for performance. Finally, reports are assembled using building blocks and choosing the right pattern.
Integrated, Aligned Data Is the Key to Success
Real-time information provides important business insights. But integrated data is as important as timely data. Decisions rely on data from several sources and these data sources may be updated at different frequencies. Some sources are updated every second while others refresh every hour. With the legacy enterprise data warehouse, some data was updated even less frequently. Business processes reflect these differences.
For example, Intel has a process to forecast depreciation five years in advance. This job required data from many sources and took 36 hours to run, so it was only run once per month. We built an entire business process chain around this limitation. When we rewrote the forecast job using SAP HANA, it ran in less than 60 seconds. This allowed us to completely reimagine our existing processes, gaining significant business agility. Now, analysts can run the depreciation report at any time and provide a financial update with an accurate balance sheet and forecast P&L statement. They can also perform what-if analyses for projects with large capital expenditures.
Our IDP is already providing measurable business benefits. With the new capabilities, insights, and better forecasting we are confident we will hit our estimated five-year ROI of USD $208 million. If you want to learn more about IDP, check out this IT@Intel white paper: "Transforming Intel's Supply Chain with Real-Time Analytics."