The rate of new tooling and mechanisms to manage and deploy applications has been transforming. Cluster schedulers like Kubernetes* and workload constructs like Linux* containers have brought a new set of powerful approaches for cloud operators.
Yet with every new improvement in productivity there are opportunities to drive efficiency. While deploying thousands of microservices at scale has massive benefit, it brings new complexity. As we move from monolithic applications to these microservices we encounter new problems with deploying these services next to each other. The ability to collocate these workloads offers a tremendous opportunity to gain higher utilization and lower the total cost of ownership of cloud infrastructure. But this colocation of services can introduce performance variability and heavily affect the service levels of the workloads. An application that would normally respond with predictable latency by itself may have unpredictable response times when collocated with another workload.
Google* describes the problem as “tail at scale”—the amplification of negative results observed at the tail of the latency curve when many systems are involved. For example, if the average server response is 10ms, but one in 1,000 requests takes longer than one second when scaled across 300 machines, 26 percent of user requests will take longer than one second (1 - .999300). And in Google’s case, a single search can hit 300 to 700 servers.
Does that matter? According to Ron Kohavi and Roger Longbotham of Microsoft*, “Experiments at Amazon.com* showed that every 100-ms increase in the page load time decreased sales by 1 percent, while similar work at Google revealed that a 500-ms increase in the search results display time reduced revenue by 20 percent.” Harry Shum, also of Microsoft, says, “Two hundred fifty milliseconds, either slower or faster, is close to the magic number now for competitive advantage on the Web.” And those are web apps. The same problems occur with even greater impact in latency-sensitive applications like network functions virtualization and high-speed trading.
That’s a problem worth solving. So at Intel, we’ve been working on several fronts.
First, we’ve enhanced Intel® Xeon® processors to enable finer control over the way critical processor resources, such as L3 cache memory (the cache on a processor that’s shared across all the cores) and memory bandwidth, are allocated when requesting processes, and to enable monitoring of that information so you can better tune software and adjust operations. Cache Allocation Technology (CAT) and Cache Monitoring Technology let you control how L3 cache is allocated based on established classes of service. Memory Bandwidth Monitoring does the same for memory bandwidth. Code and Data Prioritization goes farther to allow software control over how cache is allocated, so you can reserve dedicated cache for critical processes.
Next, we’re working to ensure the software is in place to access and enable these features. Support for CAT on Intel® Xeon® processors is now provided in the Resource Control subsystem of Linux kernel 4.10. And we’re working within the Kubernetes open source community to enable Kubernetes to take advantage of these capabilities. To achieve the workload isolation essential for critical low-latency services, their class of service can be established and cache can be reserved when containers are started.
At OSCON* this week, we have been talking to people about these new capabilities and demonstrating an application to monitor and maximize resource allocation on Intel® Xeon® processors. It creates a sensitivity profile for the monitored software to show developers how sensitive the software is to resource contention, and help them predict when and where they will encounter knee-of-the-curve bottlenecks that will send performance off the scale. It is becoming increasingly possible to make smart decisions about how to use the resource isolation features of Kubernetes to assure the performance needed.
You may not yet be developing applications spanning 300 servers, but if you are developing cloud-native software that will need to scale, Intel® Xeon® processors and Kubernetes will help you avoid tail at scale and keep users on your site. If you’re a developer, get involved—check out the work done in the Kubernetes NODE special interest group.