Billowing The Cloud—Testing High-Scale 100GbE Data Center Switches

The demands for increased bandwidth in the data center are driving the creation of a new generation of data center switch platforms. These platforms, whether they are top-of-rack (ToR) or aggregation switches, are designed to support hundreds of gigabits or even terabits per second while simultaneously:

  • Delivering low latency to meet the demands of delay sensitive applications such as high frequency trading (HFT)
  • Creating new standards of power efficiency
  • Supporting programmable interfaces for the new paradigm of software defined networking (SDN)

All this is combined into a package to meet the needs of highly cost-sensitive providers.

How can network manufacturers validate their entire system operating at the designed scale of hundreds of ports? How can they prove their value to their demanding data center customers? 

Cisco and Network Test recently completed a good demonstration of how to perform this kind of large-scale system test for the Cisco Nexus 9516

Cisco Nexus 9516 Test BedPart of the challenge, especially for 100GbE links, can be navigating the alphabet soup of new 100GbE physical layer technologies. Smaller and more efficient transceiver form factors like CFP2, QSFP28, CFP4 and CPAK allow network vendors to cram more ports into the same space. Combine this with new data-center-focused PMDs such as 100GBase-SR4 and 100GBase-CR4 and even figuring out how to connect devices can involve significant decisions. 

Forwarding Traffic Performance

The primary task of a switch system is forwarding traffic—massive amounts of it. Often vendors will demonstrate drop-free performance at the theoretical maximum rate with the smallest frame size. This removes all doubt about forwarding performance. Given the cost-sensitive nature of the customer base, however, some vendors choose to create higher density and lower cost solutions by only delivering drop-free performance against typical data center loads. Often these options are presented as different line cards or special modes of operation. While this approach can increase port density by up to 30% it opens up uncertainty about when a switch might drop traffic—uncertainty that would be removed by system testing.

Video Performance

Real-time video and voice services represent a large and fast growing percentage of network traffic. The quality of these services is impacted by jitter on the link between client and server. Jitter can be combatted by buffering—but at high speed and high port density buffering is expensive. Thus a data center switch platform is pressured to find the right tradeoff. Expensive buffering makes the platform less attractive to cost-sensitive buyers but with insufficient buffering the quality of key services degrades. This is why good system tests focus on this accurately characterizing jitter—reporting it not only under the maximum theoretical stress but also under more typical loads. This is why network equipment vendors are careful to report how their systems control and constrain jitter.

Testing for Delay

Anyone who read the best-selling book “Flash Boys” knows the importance of delay to the financial services industry and HFTs. For this reason, the data centers through which many of these trades flow are incredibly concerned about each microsecond of delay through the switch platform. Delay can also degrade the quality of real-time voice and video services which are both growing at strong rates. Again, there are system design decisions that can reduce latency—ironically, making sure frames are not buffered is one of them. This kind of design tradeoff is a key reason why network manufacturers and their customers need accurate measurements of both jitter and latency under maximum and more realistic loads. The way different vendors make these design decisions forms the competitive battle ground within the data center switch market. 

Validating the Control Plane Protocols 

The system not only has to deliver high quality of service for drop-sensitive and time-sensitive applications but it also has to run control plane management protocols while forwarding traffic at high rates. System tests include benchmarking applications like BGP while the switch is under heavy load. The security of the switch itself as well as the traffic flowing through it also needs to be characterized.  As the control plane moves to an SDN framework these control plane protocols will also need to be tested while the distributed fabric is strained. 

Large-Scale System Testing is Must-Have

A large-scale system test is a great opportunity to look at other important buying criteria such as power consumption or system availability. A data center with hundreds of switches spends a considerable sum powering them and cooling them. Base-lining power consumption and evaluating power-saving technologies can demonstrate a lower cost of ownership for one switch platform against a competitor. Many data center switch architectures support power, data plane and control plane redundancy. If a provider is going to invest in the added cost of redundant systems it is wise for network vendors to demonstrate the sought-after availability.

The relentless demand for bandwidth continues to drive innovation in network switches. The designs of these systems involve important tradeoffs between cost, service quality, security and reliability. Equipment vendors need to perform system testing to establish the performance of the platform when operating at high port scale and high rates of traffic forwarding. They need to establish the value these platforms deliver to the demanding data center customers who will buy and deploy them to deliver cloud-based services. A system test like the one recently performed by Network Test and Cisco is a great example of how to get started. 

For more detailed information visit: www.spirent.com/100G

comments powered by Disqus
× Spirent.com uses cookies to enhance and streamline your experience. By continuing to browse our site, you are agreeing to the use of cookies.