If you’ve been following Intel IT’s data center strategy journey, which began in April 2010, you know that every few years we publish an update to our white paper, “Data Center Strategy Leading Intel’s Business Transformation.” These updates track the extraordinary business value (USD 2.8 billion in savings so far!) that can result from running Intel’s data centers like factories. We apply breakthrough technologies, solutions, and processes to optimally serve the acceleration of Intel’s business through the following metrics: best-in-class quality of service (QoS), lowest unit cost, and resource utilization efficiency.
Our data center strategy centers around a simple concept: continual improvement in each of these three metrics. We measure the improvement by first defining a “Model of Record” (MOR). This term represents a data center environment with an unconstrained budget where we could buy the latest and greatest technology, develop new solutions, and update or develop new processes. In this MOR environment, we calculate the lowest unit cost, best QoS, and maximum utilization. The MOR improves every year, because technology and processes improve every year.
But in reality, every IT shop has a limited budget. The question is, within that limited budget, where should we invest to ensure the collective investments bring the desired result? This real world environment is called the “Plan of Record” (POR).
Year over year, our goal is to improve the POR at a faster rate of change than the MOR changes, so that we continue getting closer to the MOR every year. Our seemingly simple MOR/POR data center transformation strategy has created unprecedented business value: a cost savings exceeding USD 2.8 billion (over 9 years) compared to public cloud infrastructure as a service (IaaS). Through breakthrough innovations, our data center strategic program has led to the creation of a portfolio of intellectual property.
Data Center Strategy Tactics
Our data center strategy considers how our data centers need to transform to support Intel’s emerging business models. Therefore, the strategy covers everything that involves our data centers—facilities, servers, storage, network, OS, orchestration, provisioning, management and monitoring, and more.
A primary area of focus is to reduce the cost of our data center facilities. This includes construction costs (measured in $/KW), electricity costs, water costs, and more. We have pursued several unique ways to reduce construction costs. For example, our newest data centers use recycled water instead of fresh water. This lowers the price for the water we use and preserves a critical natural resource. We also use advanced data center cooling techniques, such as evaporative cooling towers. Think of these working like a diner blowing on a cup of soup to cool it. With these innovative towers, we can remove expensive chilled water and computer-room air conditioning (CRAC) units.
The result? We reduced data center construction costs by nearly 3x, and our operating costs associated with cooling went down from 49 percent of total operating cost to 6 percent—saving Intel USD 1.9 million for every 5 megawatts (MW), and achieving an industry-leading 1.06 power-usage eﬀectiveness (PUE) data center efficiency with a 36 MW capacity. This has reduced our electricity cost by nearly USD 14 million per year. What’s more, our recycled water initiative significantly reduces our freshwater usage, compared to the industry average of 44 million gallons per every 5 MW in a typical data center. In aggregate, these data center facilities cost-reduction tactics have reduced the amount we spend on data centers from 39 percent of our IT budget to less than 20 percent while reducing impact to the environment.
In the following areas where our data center transformation strategy has also produced great value, we brought use case-optimized equipment to our servers, storage, and network deployments. This resulted in just the right solution for each workload at the right cost structure.
- Disruptive server technology. Servers are what powers Intel’s business—product design and manufacturing—not to mention our state-of-the-art cybersecurity platform, analytics, and more. We invest in value generation, deploying the latest Intel® architecture-based servers with more cores, faster clock speeds, more memory per core, faster memory speeds, and low-power high-performance Intel® Solid State Drives (Intel® SSDs). Our compute utilization was formerly about 70 to 80 percent; now it is more than 90 percent in hub data centers. Investing in newer servers has increased the capacity of our high-performance computing (HPC) environment by 236x (compared to the capacity in 2005), and increased the availability of the HPC environment by 80x.In addition, we invented the first major advance in server design in over a decade: the disaggregated server. By decoupling the CPU/DRAM and NIC/Drives modules from other server components, we can independently refresh servers’ CPU and memory without replacing other server components. This results in faster technology adoption, which in turn puts new technology at our design engineers’ fingertips. We have deployed 140,000 disaggregated servers since July 2016, using seven different motherboards that we invented and designed. These motherboards have slots available for specific workload accelerators, such as a graphics processing unit (GPU), Intel® Nervana™ Neural Network Processor (NNP), or Intel® FPGA modules in the future. In our experience, the disaggregated server design lowers refresh costs by 44 percent compared to a full acquisition (rip-and-replace) refresh, reduces provisioning time by 77 percent, and reduces total cost to environment (TCE) by 82 percent.We also replaced HDDs with Intel SSDs in some of our most heavily used servers. These SSDs offer much larger capacities and lower power consumption at a similar cost. In less than six months, we deployed 40 PB of fast local SSD cache, resulting in a lower dependence on storage and network bandwidth and a substantial performance improvement.
- Optimized storage deployments. All use cases are not created equal, and we determined that not every use case requires tier-1 storage. Instead, we introduced multiple storage tiers, adjusted data retention times for some use cases, and shifted less active data to a lower storage tier. We also evaluated how much high-performing cache space is required for most use cases, concluding in general that a 10 percent cache was sufficient. Of course, we also applied traditional storage cost-reduction techniques, such as deduplication and thin provisioning. These equipment and process changes have significantly reduced storage capital costs: storage utilization went from under 40 percent to more than 80 percent at our hub data centers (which house about 80 percent of our storage capacity).
- Optimized network technologies. We applied similar optimization processes to our networking equipment. Not every use case requires a high-end network connection. Instead, we optimize our network equipment per use case—for some use cases 1 GbE is sufficient. For others, a 10 GbE connection is required. In rare cases, we need a 40 or even a 100 GbE network. This workload optimization maximizes network-switch port utilization. Ports are like airline seats—if you pay for them but don’t use them, you’re wasting money. We also implemented new monitoring capabilities and reduced latency with Intel® Silicon Photonics where needed.
- Standardized OS versions. For Linux* and Windows*, we standardized which version should be in use across our IT environment and implemented an organized migration strategy to newer kernels that can take advantage of newer features of Intel® Xeon® processors.
- Better batch queueing. In our hyperscale data centers, we implemented orchestration solutions that can support hundreds of thousands of cores in a cluster in our hyperscale data centers, categorizing different batches with different priorities. This again creates a situation where we have the right solution for the right workload for the right cost.
- Dock to production in one day. Sometimes data center efficiency isn’t just about technology—it can also involve innovative business process improvements. For example, historically when a new server landed on the loading dock at a data center, getting that server into production often took up to two weeks. Recently, we challenged the operations team to reduce that time to a single day. Rising to the challenge, the team initiated significant process improvements, such as pre-planning for network and power cabling, labels, IP numbers, pool and cluster assignment, and rack/chassis location. Pre-planning before the server arrives means that when it does arrive, the team has to simply install the new server at the predefined rack space, connect cables, load the OS, do a burn-in test and quality check, and voilà—the server is producing value in less than 24 hours.
Data Center Strategy Involves Technology, Processes—and People
Although I’ve talked at length about technology and processes, it is important to note that a data center transformation involves transforming IT engineers as well. From 2010 to present, we’ve been able to transform the operational mindset of IT personnel to an engineering mindset. They now bring more innovative and breakthrough ideas, solutions, and processes to the strategy table. They know how to think outside of the box and can more effectively solve scale complexities, storage and network challenges, and process bottlenecks. Just as we have highly valuable and innovative technology in our data centers, we have highly valuable innovative staff running those data centers.
To learn more, read the IT@Intel white paper, “Data Center Strategy Leading Intel’s Business Transformation.”