Testing High Availability in Switch Fabrics
The advent of cloud computing in data centers has precipitated the transition from Ethernet switches forming a network to fabrics consisting of multiple switches. While a switched network favors a hierarchical tree topology with a unique path to each element, a fabric is all about achieving high availability and low latency through a mesh topology, fully utilizing all available links.
The fabric is not new to the data center. Switch fabrics via Fibre Channel have been around for a few decades, but have only recently appeared via Ethernet.
As Brandon Carroll mentions in his blog post Understanding Switch Fabrics, the difference between a network of switches and a fabric is most apparent in the implementation of the various flavors of Spanning Tree Protocol (STP) in a network of switches. In such an environment, multiple paths create loops, which are bad. STP blocks ports to enforce the unique path rule. By contrast, the strength of a fabric lies in the mesh topology, which achieves higher availability and lower latency while reducing network infrastructure cost and operational expenses.
Best practices indicate a set of tests that should be a mandatory part of any comprehensive evaluation of a data center fabric, such as using full-mesh patterns to fully stress the device, and testing the redundancy algorithms that guarantee high availability, as shown in recent tests of Arista Networks' DCS-7508 data center core switch and Juniper Networks’ QFabric™.
Multiple permutations can be tested with this topology, including pure unicast traffic, pure multicast traffic, a combination of unicast and multicast traffic, and, if IP routing is a requirement, test cases involving IPv4, IPv6, and a combination of IPv4 and IPv6 traffic.
Testing with fully meshed patterns is critical for modular switches with line cards that plug into a central backplane. Depending on switch and fabric designs, testing in port pairs can create misleadingly good or bad results compared to real-world performance by pushing all traffic over very few backplane paths.
By testing the same switches configured with the spanning tree protocol and then with multi-path mechanisms, such as Transparent Interconnect of Lots of Links (TRILL) or Shortest Path Bridging (SPB), it’s possible to determine how much bandwidth is gained and how much latency is reduced by moving to a fabric. Independent comparison tests are available of Cisco FabricPath and HP FlexNetwork architectures.
An essential set of tests of fabric redundancy measures the time it takes the system to recover from a component failure or from the addition of a new switch to the fabric. The test offers traffic to all ports on the edge switches while test engineers disable a backbone or spine component of the switch fabric. The system should quickly recompute alternative paths to take over from those previously handled by the failed backbone switch.
The Ethernet fabric is the next step in reducing cost in the cloud while improving performance and availability. Make sure to test it properly to guarantee you’re getting the benefits.
For more information on how to perform such tests without cheating on fail-over latency measurements, download the 10 Essential Benchmarks for High-Performance Data Centers whitepaper.