Evaluating Performance and Resiliency of HPE Helion Carrier Grade NFV Platform Powered by Wind River Titanium Server

NFVi performance benchmarking is one of the first challenges that the networking industry is trying to address in order to get Service Providers (SP) closer to the deployment of NFV based networks. At this point in time, NFV solutions from different vendors (both NFVi and VNF) are trying to get to the plug and play mode envisioned by ETSI NFV where SPs may be able to buy NFVi and VNF components from any of the vendors in order to optimize performance, reliability and resiliency.

The industry is coalescing towards an ecosystem approach, where multiple vendors are validating their solutions to provide test data, not just to illustrate their interoperability, but also to illustrate the optimization in performance, reliability and resiliency that can be achieved by these ecosystem partners.

In such an environment, there are NFVi vendors who are taking on the challenge of providing data points to Service Providers that their solutions can obtain the highest performance and reliability by using testing methods that isolate the performance factors for NFVi and VNF while providing a level of confidence that helps SPs compare solutions from different vendors.

One of the public tests aimed at providing empirical data in that direction was performed between HPE, WindRiver and Spirent. These tests were executed and observed by The Tolly Group, with the test bed being created by HPE, WindRiver and Spirent. Spirent was also involved in defining the test plan for this public test.

The details of the test bed and the test results can be found in the test report “Hewlett Packard Enterprise NFV System Test by Tolly”.

“More and more NFVi vendors and Service Providers are approaching third party vendors such as The Tolly Group for independent public tests to validate and evaluate the performance of their NFV solutions. We were happy to work with Spirent to obtain the necessary testing tools and test methodologies for evaluating the performance and resiliency of the HPE Helion Carrier Grade NFV platform.”

- Zachary Schaffer | Director of Lab Programs

This blog post is not intended to evaluate the details of the test results as those have been described in the test report. The objective is rather to highlight the key aspects of testing and why those aspects are important when benchmarking NFVi.

Two categories were covered as part of the public test:

  • Performance
  • Resiliency

Performance Benchmarking of NFVi

Some of the performance factors used for benchmarking traditional High Speed Ethernet networks such as throughput, latency and jitter are also applicable for NFV environments. The key is to understand how the NFVi implementation variations and configurations impact these performance benchmark factors and how the performance optimization can be achieved by understanding the correlation between the two.

Allocation of compute node resources such as CPU cores, memory to the virtual switch and VNFs can have a major impact on the packet processing/forwarding capability of the NFVi and thus impact the network benchmarking metrics.

Some of the key concepts used were:

  • Assignment of virtual switch and VNFs to a NUMA node
  • Dedicated CPU core assignment or “core pinning”
  • Dedicated memory assignment to virtual switch and VNFs
  • Understanding of the queuing and hashing algorithms employed at the physical NIC of the compute node

As part of benchmarking the HPE Helion Carrier Grade Platform, a virtual switch and a base L2 VNF were assigned to the same NUMA node with dedicated CPU cores and dedicated memory assignment. This was done to optimize the performance. Intel NICs that employed RSS (receive side scheduling) were deployed in the compute node under test.

CPU chores diagram

It was observed that the NFVi system under test delivered line rate 10G forwarding performance for frame sizes 256 and above. Additionally, line rate 10G forwarding performance was observed for frame sizes 512 and above. Not only was the performance line rate for most of the frame sizes under consideration, the performance consistency across multiple runs was one of the best observed in the public tests for NFVi benchmarking. The variance in forwarding performance for 256 bytes or greater was zero and less than 0.5% for lower sized frames.

As described in a previous blog post “Evaluating Performance Consistency for NFV-Based Services”, the performance consistency is critical in optimized resource provisioning and capacity planning for shared infrastructure environments such as NFV.

On the same lines, hashing techniques employed at the NICs and the type of traffic mix used for testing can help uncover the bottlenecks in network performance. This is where the ability of the test tool for generating a real-world traffic mix and providing comprehensive analysis capabilities and metrics becomes important.

Queue scheduler diagram

NFV Resiliency

Evaluating resiliency for NFVi must involve the fault-tolerance of the NFVi components alone. As we know, VNFs are an integral part of the NFV framework and also influence the E2E service reliability and availability. Instantiation time of an VNF, from the time a new VNF is launched to the time the service offered by VNF becomes completely operational, can impact the overall convergence time of the E2E service. The instantiation time for VNF varies from vendor to vendor and does not really reveal much about the fault-tolerance and fault detection time of the NFVi.

Keeping these aspects in perspective, it makes sense to focus on fault detection time when evaluating resiliency and avoiding the measurements that depend on VNF’s ability to handle faults.

For the same reasons as cited above, resiliency for HPE Helion platform was focused mainly on the fault-detection time for the following failure events:

  • VNF failure
  • Controller failure
  • Compute node failure
  • Individual link failure in the LAG or LAG fail-over at the network interface of the compute node

Additionally, it was verified using test traffic that the SUT (System Under Test) recovered from the failure and the service emulated by Spirent TestCenter was back to normal operation.
Convergence and Recovery was verified by Spirent TestCenter using advanced dynamic results view which allows the user to plot the failure, convergence and recovery events in the service flow.

The following figure illustrates the failure detection time for the failure scenarios tested:

Chart showing carrier grade fault detection scenarious as reported bt TestCenter 4.59

Network function virtualization introduces increased variability and dependency on multiple factors that determine the network performance. It is important to understand the correlation between the performance of the NFVi nodes and the performance metrics of network services that are provisioned on the NFVi. Therefore, test tools and test methodologies that provide the ability to make this correlation and help characterize the VNFs and virtual switch performance are critical. For testing performance and resiliency of NFV platforms effectively, test methodologies have to be designed with a good understanding of the factors that influence NFVi performance, and in turn network performance.

Spirent is leading the effort in providing not just a test tool for NFV environments but a complete test solution that provides actionable results for NEMs and Service Providers.

comments powered by Disqus
× Spirent.com uses cookies to enhance and streamline your experience. By continuing to browse our site, you are agreeing to the use of cookies.