Datacenter Fabric Convergence; FCoE Can Help

Ethernet has been around a long time. It is a highly reliable and trusted means for interconnecting computing nodes, and above that, it has generally been the most commoditize (read: lowest cost) form of interconnect for quite some time. Broad deployment, administrator trust, and low cost have kept Ethernet as the mainstream fabric for LAN traffic for a long time.

However, despite Ethernet's strong connectivity credentials, it still comes up short in certain applications. Ethernet is what is referred to as a ‘best effort' network. This simply means that in the real world, you will generally get pretty good performance (throughput, latency, lack of dropped packets, etc), but from time to time when there is congestion, packet drops and performance degradation can be quite a nuisance. For many applications, this doesn't matter. If you are using email, browsing the web, or transferring files to a shared drive, the only thing you will notice is a decrease in performance, but everything will still ‘work', and transfer properly. For some applications like storage though, this non-deterministic performance is unacceptable. If packets are dropped, or arrive out of order, storage applications have a nasty tendency to hang or crash.

Because of this limitation of the standard, there have been separate fabrics used for Storage Area Networks (SANs) for quite a while. One of the main fabrics developed and used for high performance SANs is known as Fiber Channel. In order to create a Fiber Channel network, a server and storage target need to support a Fiber Channel Host Bus Adapter (FC HBA) to communicate via the Fiber Channel protocol. In addition, the switches that connect the Fiber Channel infrastructure must also be dedicated Fiber Channel switches; a standard Ethernet router cannot be used.

Once in place, this SAN architecture provides a very high performance, high reliability network that is ideal (and required) for high end storage traffic, but it comes at a cost:

1) Fiber Channel HBAs are generally more expensive than their Ethernet counterparts.

2) You have to have a separate fabric in your network which also adds to your infrastructure (switch costs, and cabling costs) as well as complicates IT management.

3) Servers connected to the SAN now need to have an Ethernet adapter AND a Fiber Channel adapter.

The upside to the additional cost and complexity is of course better performance, but the question has always been "Is there a better way?"

I believe there is a better way, and that Fiber Channel over Ethernet (FCoE) (and importantly, the standards in IEEE that are making it possible) seems to be the logical path to solve the issue of performance on lossless performance on Ethernet, while maintaining Ethernets historical core cost advantages.

‘Best Effort' is not good enough:

The bottom line for today's Ethernet is that it simply can't provide the ‘lossless' behavior that storage traffic demands; but this fact is changing. Below I will summarize at a high level some of the standards being developed in IEEE to improve the performance of Ethernet for storage applications, and how they help to mend some of the issues with Ethernet and how that helps to enable FCoE.

Bandwidth Sharing, Priority Flow Control and Pause:

This capability offers a method to assign priorities to different Ethernet traffic classes. From there, when congestion becomes an issue, traffic can be ‘paused' on a per-priority basis; allowing the lower priority traffic to be halted temporarily while keeping the top priority traffic like storage running smoothly. This per-priority pause capability is really the first basic step in allowing Ethernet to provide some ‘QoS like' Layer 2 guarantees.

Congestion Notification (or Backward Congestion Notification):

In addition to simply pausing individual low priority streams of traffic, congestion notification allows for a communication method to go upstream from the node and notify the offending traffic generator to throttle back its traffic and re-route as necessary. This capability is a key to the longer term development of FCoE because with only the pause capability the congestion is really just pushed up a single node in the network. In order to support FCoE storage across multiple nodes in a network, congestion notification is needed.

Shortest Path Bridging:

This capability is really an optimization for inter-node routing that defines the path within the network between switches. Using traditional spanning tree path algorithms will sometimes result in paths in the network that are non-optimal and incompatible with high performance storage traffic. A new algorithm to determine the shortest path between nodes will help to enable both less congestion in the network as well as fast delivery of critical packets for storage.

DCB Capability Exchange Protocol (DCBX):

This capability goes by several different names depending on who you talk to, but essentially what it will provide is the ability for switches on the network to exchange their capability sets with other nodes of the network. This allows for each switch to understand what others switches near it can use the Congestion Notification, Flow Control, or other features need to support this ‘Lossless Ethernet' capability.

While the list above is not meant to be all inclusive of all the new IEEE development under way for this new ‘Lossless Ethernet' initiative, it should provide a good overview of the general push taking place and how the goal of getting to near lossless performance is going to be accomplished.

Weren't we talking about Fiber Channel?

Astute users will realize that I haven't really addressed the Fiber Channel piece of this. The above features I described only allow for Ethernet to carry certain kinds of traffic (like Fiber Channel) that require very high reliability and performance; but how do you get the Fiber Channel data onto an Ethernet frame?

In today's environment, a Fiber channel initiator on a Server system will place Fiber Channel data onto an FC HBA to send over the SAN to a storage target. All of this data is transmitted over a fiber channel network. Under the FCoE model, what you will need is a Server system that has an FCoE initiator, and on the target side, the switch connected to the target must be able to convert the data from storage target and encapsulate it into Ethernet. Beyond that, the data is transmitted over the Ethernet fabric as normal, but the features that I described above allow for the performance of Ethernet to allow a Fiber Channel application stack to function properly.

This is certainly a capability that Intel has been supportive of. Ethernet is a critical piece of the computing platform, and FCoE provides a potential improvement for datacenter and storage network design. By consolidating the Fiber Channel data onto a single Ethernet wire, end user IT houses can also see several benefits:

1) Reduced the need for two physical network cards in each server. Now, a single NIC will connect to the SAN and to the normal TCP/IP data network.

2) Along with the consolidation in network cards, you also save in terms of cabling. One 10 Gigabit link can replace the old Fiber Channel fiber link and Ethernet links.

3) Reduces power consumption and cooling

4) The commoditized and low cost nature of Ethernet provides additional benefit by converging system I/O onto what will likely be the lowest cost interface over the coming years; 10 Gigabit.

In summary, FCoE may be in its infancy, but the standards in final, or in process. Products are available today, and the value proposition in here. Further performance improvements and cost reductions and the proliferation of 10 Gigabit networks, as well as more choices in the future, will only further the support and interest in Fiber Channel over Ethernet in datacenter SAN applications.

~ Ben Hacker

Links for further information: