I/O Virtualization – Round 2!

In my last I/O Virtualization blog, earlier this year, I discussed a fundamental problem with virtualizing I/O and one the solution that Intel and VMware have teamed up to deliver - VMDq and VMware NetQueue. These queuing technologies together can help to offload some of the virtual switching (vswitch) functionality to the network adapter from the hypervisor. VMDq provides a method for the Hypervisor to do less work, and also provides a way to share the I/O processing across multiple cores; improving system bandwidth and more fully utilizing its processing power.

Now, VMDq and NetQueue are a great solution together that scale well, support Vmotion, and are relatively simple to manage. However, is there a way to get even better performance from your Virtualized I/O?

What if there was a way to completely cut the Hypervisor software switch out of the picture and remove the associated latency and CPU overhead? The ideal scenario for optimum performance is for the VM to communicate directly with LAN hardware itself, and bypass the vswitch completely. For example, you could have a single 10 Gigabit port expose multiple LAN interfaces at the hardware level (on the PCI-e bus), and each VM could be assigned directly to a hardware interface. Alternatively, you have multiple physical NICs in the system that could be directly assigned to a given VM. Below is a diagram that summarizes the 3 main variations of attachment for I/O in a virtualized server. Below we will get into more detail to put the diagram in context.

In the diagram above, the left side represents an implementation of a virtualized environment with a standard I/O setup using the Hypervisor vswitch and VMDq for I/O performance enhancement. In the middle is an example of direct I/O assignment between a single physical LAN interface and a single Virtual Machine. The implementation on the right is showing what is possible with a single NIC that supports SR-IOV (we'll discuss this later) for a fuller, hardware level I/O virtualization. After taking a moment to understand the basic differences in these three implementations, there are immediately a few obvious benefits here for bypassing the Hypervisor vswitch and going with either of the two directly assigned designs...

By allowing the Virtual Machines to talk directly to the networking hardware, throughput, latency, and CPU utilization of the I/O traffic processing will be greatly improved. So the question is, "why hasn't this been done before?" Well, the answer is that there are several gotchas to make this implementation work well...

First, in order to implement this properly, the LAN hardware needs to support some physical capabilities to successfully route the networking traffic in this kind of virtualized system. In addition to all of the above the actual server hardware itself must also support VT-d so that the memory mapping between the Virtual Machine PCI-e memory space and the systems physical memory space are correlated correctly. Also, the actual system itself must also support VT-d so that the memory mapping between the Virtual Machine (I/O data memory address) and the systems physical memory address are correlated correctly.

Finally, and this is a big one, this kind of implementation while very good for performance just happens to break the ability to move a VM from one physical server to another (VMware Vmotion). This is one of the more widely used aspects of VMware's software that has been utilized heavily by most IT shops. Seamless vmotion support is critical for making any I/O performance improvement deployable in the real world.

Now, if you stop at the 2nd diagram, and use separate NICs for each VM, you will also miss out on a few key advantages of new Ethernet capabilities. You won't be able to allocate your overall bandwidth between your VMs (each VM will get a single Gig or 10Gig port), and more importantly, you won't be able to effectively share higher bandwidth pipes. For example, a server with a few 10 Gigabit ports may have enough I/O horse power to handle traffic for 30 VMs, but there would be no way to assign only a portion of the bandwidth of the pipe to an individual VM.

Additionally, the LAN hardware needs to support the ability for each virtual function of the LAN device to be able to support bandwidth segregation (think QoS per VM) and the ability to support multiple queues and traffic classes per LAN virtual function. This last piece is necessary for those who remember the discussion on Fiber Channel over Ethernet (FCoE), as the ability to support multiple traffic classes, and dedicated bandwidth links, are key needs for the storage over Ethernet market.

Now that I've set up what is needed to make this directly assigned virtualized I/O environment work, and called out the potential problems, you don't need to worry; I won't throw cold water on this idea. In fact, most of the pieces are in place today and there is already work being done to complete the solution as we speak.

First, Intel network adapters now support some fancy hardware capabilities related to virtualization. In addition to all the hooks for VMDq, our newest NICs support PCI-SIG SR-IOV (I know... technologist love acronyms) which provides the ability to virtualize the LAN at the lowest hardware level. The networking hardware also supports some smart logic to be able to function properly in a virtualized system. For example, VM to VM communication in the same server must be looped back before it gets to the wire or the switch connected to the machine won't know how to route the packet. This is all taken care of in the LAN hardware. And of course, all the support for bandwidth segregation, and support for multiple queues and traffic classes is there as well to make sure Storage and other QoS sensitive applications are still going to work well.

As for VT-d support, Intel platforms now come with this basically standard, so there is no issue there. But the last most important piece is the ability for an individual VM to be moved between physical servers while still being able to ‘renegotiate' with its physical network connection. The ability to do this is under development by Intel, VMware and others in the industry, and the end goal is to have an architectural framework in place to make this kind of handoff seamless from a hardware and software perspective.

This architectural framework will be the topic of a future post, as I think I've used up all the lines I can before I start putting my readers to sleep. Until next time!

Ben Hacker