Marketing often creates them, sales makes the promise, and operations has to deliver them.
Service Level Agreements or SLAs are at the heart of Ethernet business services. They continue to gain traction in the market as the competitive differentiator for many service providers looking to out “market” and perform their competition.
- How do you manage the network and service to an SLA?
- How do you interpret SLA jargon into operational process and procedure?
- How do you monitor and measure the network to guarantee five nines service availability, reduce MTTR, response time, latency, jitter and packet loss?
- How do you guarantee installation intervals?
- More importantly how do you preempt SLA violations to avoid penalties and service credits?
Download the whitepaper and see how real-world Service Level Agreements can be translated into actionable business processes and set the operational requirements needed to successfully manage SLA backed business services.
TRANSLATING SLAs INTO
OPERATIONAL REQUIREMENTS
October 2011
Rev. A 10/11
SPIRENT
1325 Borregas Avenue
Sunnyvale, CA 94089 USA
Email: sales@spirent.com
Web: www.spirent.com
AMERICAS 1-800-SPIRENT • +1-818-676-2683 • sales@spirent.com
EUROPE AND THE MIDDLE EAST +44 (0) 1293 767979 • emeainfo@spirent.com
ASIA AND THE PACIFIC +86-10-8518-2539 • salesasia@spirent.com
© 2011 Spirent. All Rights Reserved.
All of the company names and/or brand names and/or product names referred to in this document, in particular,
the name “Spirent” and its logo device, are either registered trademarks or trademarks of Spirent plc and its
subsidiaries, pending registration in accordance with relevant national laws. All other registered trademarks or
trademarks are the property of their respective owners.
The information contained in this document is subject to change without notice and does not represent a
commitment on the part of Spirent. The information in this document is believed to be accurate and reliable;
however, Spirent assumes no responsibility or liability for any errors or inaccuracies that may appear in the
document.
Translating SLAs into Operational Requirements
Contents
SPIRENT WHITE PAPER • i
Why MSOs are Aggressively Offering Ethernet Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
The Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2
Mobile Backhaul . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2
Business Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3
The Challenge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4
Ethernet SLAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5
Vertical markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5
A Comparison of Typical SLAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6
Breaking Down the Ethernet SLA: Defining Operation Milestones . . . . . . . . . . . . . . . . . . . . . .7
Turn Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7
Performance Management and Key Performance Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . .8
Trouble Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8
Penalties and Credits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9
Moving into the Big Leagues with an SLA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9
Traffic Diversity and Class of Service Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10
Translating SLAs into Operational Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11
Turn Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11
Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11
The Current Turn Up Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12
Service Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13
Optimizing Turn Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13
Performance Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17
Tracking KPIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17
Active Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18
PM as a Management Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19
PM as Triage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20
Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23
Translating SLAs into Operational Requirements
1 • SPIRENT WHITE PAPER
WHY MSOs ARE AGGRESSIVELY OFFERING ETHERNET SERVICES
The proliferation of Ethernet has opened up several new markets to MSOs that
have been the domain of the traditional Telcos .
The trends have been clear for some time—the growth of IP and mobile real-
time traffic, low-cost Ethernet ports, the revenue shift from voice to data, and
the packet video explosion . Everything points to Ethernet as the transport
technology of today and into the future .
According to Frost & Sullivan
1
, the retail US Carrier Ethernet services market
continued to keep up its 2009 momentum through 2010, growing at an
impressive annual growth rate of 40 percent . Worldwide Business Ethernet
Services market will grow to $US 40 .2 billion by 2014, according to Vertical
Systems
2
.
A Yankee Group
3
report indicates that the market for wholesale backhaul
services in North America will grow from $US 2 .45 billion in 2010 to $US 3 .9
billion in 2015, with the majority of this growth coming from Ethernet backhaul
4
.
MSOs are not watching from the sidelines . In 2007, Cox Business became the
first MSO to reach the top tier of U .S . business Ethernet providers . Time Warner
Cable’s “Business Class” was recognized as the Metro Ethernet Forum’s (MEF)
2008 Ethernet Service Provider of the Year . In 2009 Comcast moved beyond
providing voice and data services over a traditional DOCSIS-based coax network
for small and medium sized business (SMB) to serving mobile backhaul on
Ethernet, and more recently offering Metro Ethernet services to medium-sized
businesses . These two examples illustrate MSOs that have successfully entered
this space and grown their business, but the list is large and continues to grow .
1 Frost & Sullivan: U.S. Retail Carrier Ethernet Services Market Update, 2011, August 2011 .
2 Fierce Telecom: eBook Ethernet Exchanges Make the Interconnection, December 2009
http://www .verticalsystems .com/download/Ethernet%20ebook .pdf
3 Yankee Group: Wholesale Mobile Backhaul: There’s Gold in Them There Hauls, YankeeGroup .com, June 2011
4 Yankee Group: Wholesale Mobile Backhaul: There’s Gold in Them There Hauls, YankeeGroup .com, June 2011
Translating SLAs into Operational Requirements
SPIRENT WHITE PAPER • 2
Figure 1. Infonetics Research, Mobile Backhaul Equipment and Services, October 2010
THE APPLICATIONS
There are several markets where Ethernet plays a critical role and where MSOs
can step in to provide solutions and generate revenue, which include mobile
backhaul and business services .
Mobile Backhaul
Legacy mobile backhaul is one of the biggest contributors to the high cost of
delivering real-time content . Time division multiplexing (TDM) circuits with their
fixed bit rates do not scale easily in response to variations in demand and are
six times as expensive to install and maintain as the alternative, Ethernet .
However, legacy TDM backhaul service has several features not found natively in
Ethernet, such as path-level monitoring and fault detection, simple provisioning,
fast protection switching, and robust timing and synchronization . For mobile
operators to be able to exploit the affordability of Ethernet in the mobile
backhaul network, it must support these capabilities . Therefore, MSOs that
sell Ethernet mobile backhaul must be able to deliver on these expectations to
be competitive .
Translating SLAs into Operational Requirements
3 • SPIRENT WHITE PAPER
According to an Ovum
5
report, all backhaul connections will be carried over
Ethernet by 2015 . Today, only 10 percent of backhaul connects are Ethernet .
Over one hundred network operators are actively deploying a single IP-Ethernet
backhaul to carry all types of traffic, including voice and data, according to
Infonetics Research
6
. This exciting market has attracted many service providers
as the competition increases .
Business Services
Delivering business services using Carrier Ethernet solutions is generally broken
into three basic architectures designed to enterprise’s network requirements:
• Point-to-Point Service (E-LINE)
• Multipoint Service (E-LAN)
• Point to Multi-point broadcast (E-TREE)
Figure 2. Metro Ethernet Services, (Source Metro Ethernet Forum)
5 Ovum: Mobile Backhaul Forecast (Global), 2010
6 Infonetics Research: Mobile Backhaul Equipment and Services: Biannual Worldwide and Regional Market Share, Size, and Forecasts,
2nd Edition, September 29, 2011
Translating SLAs into Operational Requirements
SPIRENT WHITE PAPER • 4
Corporate customers have specific requirements for service activation times,
availability, and performance . Providers deliver this through various methods,
including granular bandwidth allocation and tiers of service with performance
guarantees, such as:
• Premium service for real-time IP telephony or IP video applications
• Silver service for bursty mission critical data applications requiring low
loss and delay, such as storage networks
• Bronze service for bursty data applications requiring bandwidth
assurances, such as video
• Standard service for best effort applications such as email and
web browsing
Telco service providers have owned this market in the past . An MSO
providing Metro Ethernet services must be able to deliver similar competitive
service levels .
THE CHALLENGE
Facing an embedded competition such as CLECs and ILECs is not an easy task .
They invented the market . When it comes to delivering reliability, resilience, and
availability, they wrote the book . In a competitive sales situation, customers
often go with an incumbent carrier because nobody has ever been fired for
going with what seems like a sure thing, whereas trusting a provider new to the
market to deliver carrier-grade service over Ethernet can be seen as a
risky move .
As a result, simply marketing an Ethernet-based service as an alternative to a
traditional carrier is not enough . The competitive differentiator is the ability to
deliver services that comply with a well-defined service level agreement (SLA) .
Delivering on an SLA with penalties and credits for violations is normal business
for a carrier, but its new territory for most MSOs . However, it is a minimum
requirement for entering the medium and large business services market .
Figure 3. Structure of a business service SLA
SERVICE
ACTIVATION
• Critical turn up dates with penalties
• One or more offices
• Quality of Service validated
PERFORMANCE
MONITORING
• 24x7 “always on” quality
• Availability, Loss, Latency
• Performance guarantees
TROUBLE
MANAGEMENT
• 24x7 call center
• Mean Time to Repair
• Multiple end points
Translating SLAs into Operational Requirements
5 • SPIRENT WHITE PAPER
With mobile carriers turning up dozens of cell sites per week, quick turnaround
on turn up is essential . The ability of an MSO to deliver connections validated
for the required quality of service (QoS) performance requirements at multiple
locations quickly and efficiently is the first step, followed by managing a
production network to those performance requirements to further differentiator
their offering .
To efficiently compete, MSOs must make fundamental changes . The first step is
to develop competitive services that are backed by SLAs . But, to deliver these
services, the SLAs must be translated into operational requirements .
ETHERNET SLAs
When moving from TDM to Ethernet SLAs, the key performance indicators (KPIs)
change from bit-level metrics to packet-level metrics . In addition, the metrics of
interest vary by market
Vertical markets
According to the Metro Ethernet Forum, popularity of Layer 2 Carrier Ethernet
services tends to vary across industry market sectors, with finance, healthcare,
and retail among the biggest users of Ethernet . For finance, the focus is on
ensuring high availability and near-zero latency so that information can move
between branches, exchanges and data centers with millisecond accuracy .
Within healthcare, the focus is on medical imaging and electronic health
record applications, which drive significant bandwidth requirements and high
availability . Retailers are focused on driving real-time, actionable information,
which drives the need to support higher capacity, connection resiliency, and
low latency .
Popularity of Carrier Ethernet in Industry Market Sectors
(Metro Ethernet Forum)
40
30
20
10
0
P
er
c
ent
of
C
E
Ser
v
ic
e
s
Finance
Real Estate
Medical
Education
Data Center
Legal
Retail
Media
Figure 4. Ethernet Services Markets (Source Metro Ethernet Forum)
Translating SLAs into Operational Requirements
SPIRENT WHITE PAPER • 6
A Comparison of Typical SLAs
In the current market, brand is no longer seen as a guarantee of performance
and value . All aspects of a service, from activation intervals to latency and loss
guarantees, are evaluated and compared when choosing a vendor . Deadlines are
tight, time to revenue is critical, and performance is a competitive differentiator .
Table 1 shows SLAs currently offered by four anonymous carriers . Many of the
details are fairly standard across all carriers, such as 24 x 7 support and credits
for certain violations set at 1/30 of monthly recurring cost (MRC) or two percent
of MRC per hour . However, other parameters have a wide range of values . For
example, the greatest latency guarantee (25 ms) is two-and-a-half times as long
as the least (10 ms) . Service activation intervals range from seven days to over
six times as long, at 45 days .
Table 1. Four SLAs compared
CRITERIA SLA 1 SLA 2 SLA 3 SLA 4
Service Activation
Interval
Negotiated 45 days on-net,
negotiated off-net
7 days 14 work days
Activation Penalty 1/30 of MRC
per day
5% of MRC
per day
Negotiated 5% of MRC
per day
24 x 7 call center YES YES YES YES
Mean Time to Repair
(MTTR)
4 hours 4 hours 4 hours 2 hours on-net,
4 hours off-net
MTTR Penalty 1/30 of MRC
per hour
Every hour over is
5% of MRC
1/30 of MRC
per hour
2% of MRC
per hour over
Bandwidth (CIR)
Credit
Negotiate 1/30 of MRC
per day <CIR
1/3 MRC per day Negotiated
Bandwidth Ordered CIR Ordered CIR Ordered CIR Ordered CIR
BER N/A N/A N/A 1 x 10** —9
Loss 0.5% Avg. Negotiated N/A 0.2% Avg.
Loss Penalty 1/30 of MRC
per 1%
Negotiated N/A 1/30 of MRC
per %
Latency 25 ms Negotiated 15 ms 10 ms
Latency Penalty 1/30 of MRC
per 1 ms
Negotiated N/A 1/30 of MRC
per 1 ms
Availability 99.95% Negotiated 99.99% 95%
Availability Penalty 1/30 MRC
per hour
<4 hour = 5%,
up to 50%
1 hr = 2%
5% after first hour
2% of MRC
per hour
Translating SLAs into Operational Requirements
7 • SPIRENT WHITE PAPER
Table 2. Turn-up focused SLA requirements
CRITERIA SLA 1 SLA 2 SLA 3 SLA 4
Service Activation
Interval
Negotiated 45 days on-net,
negotiated off-net
7 days 14 work days
Activation Penalty 1/30 of MRC
per day
5% of MRC
per day
Negotiated 5% of MRC per day
Bandwidth Ordered CIR Ordered CIR Ordered CIR Ordered CIR
Loss 0.5% Avg. Negotiated N/A 0.2% Avg.
Latency 25 ms Negotiated 15 ms 10 ms
An SLA for a legacy TDM service specifies KPIs such as Errored Seconds,
Severely Errored Seconds, and Unavailable Seconds based on CRC detected
(bit) errors . However, Ethernet does not have the built-in timing of a
synchronous connection .
Although some current SLAs do include a TDM set of performance
measurements, such as SLA 4 in Table 1 which establishes a bit error rate
guarantee for Layer 1 transport, most SLAs based on carrier Ethernet focus
on availability along with packet loss and latency measurements .
Breaking Down the Ethernet SLA: Defining Operation Milestones
An SLA is a compilation of requirements across three main categories: turn up,
performance management, and maintenance .
Turn Up
When will the service be ready to use? This seems like a straightforward
question, but it can pose significant logistical challenges . For a single business
service, such as a point-to-point Ethernet E-Line service, turn up is the promised
date of operation .
Typically, timing is critical in that traffic must be handed over between two
services when one service, such as a leased line, is turned off and another,
such as an E-Line, is turned up . The specific application may allow an off-hours
cutover during a planned maintenance window, or it may require a zero-down-
time live cutover on a production network .
Other applications, such as mobile backhaul services, may require a seamless
cutover of dozens of sites at a specific time on specific dates . A single Ethernet
services contract could have hundreds of sites . Hitting critical dates may be
associated with liquidated damages for site failures . Validating multiple sites is
essential, as the SLA metrics will be both time and performance sensitive, and
can specify either a date or a time interval for turn up .
Translating SLAs into Operational Requirements
8 • SPIRENT WHITE PAPER
Table 3. Performance Management focused SLA requirements
CRITERIA SLA 1 SLA 2 SLA 3 SLA 4
Loss 0.5% Avg. Negotiated N/A 0.2% Avg.
Loss Penalty 1/30 of MRC
per 1%
Negotiated N/A 1/30 of MRC
per %
Latency 25 ms Negotiated 15 ms 10 ms
Latency Penalty 1/30 of MRC
per 1 ms
Negotiated N/A 1/30 of MRC
per 1 ms
Availability 99.95% Negotiated 99.99% 95%
Availability
Penalty
1/30 MRC
per hour
<4 hour = 5%,
up to 50%
1 hr = 2%
5% after first hour
2% of MRC
per hour
Table 4. Trouble Maintenance focused SLA requirements
CRITERIA SLA 1 SLA 2 SLA 3 SLA 4
24 x 7 call center YES YES YES YES
Mean Time to
Repair (MTTR)
4 hours 4 hours 4 hours 2 hours on-net,
4 hours off-net
MTTR Penalty 1/3 of MRC
per hour
Every hour over is
5% of MRC
1/30 of MRC
per hour
2% of MRC
per hour over
Bandwidth (CIR)
Credit
Negotiate 1/30 of MRC
per day <CIR
1/3 MRC per day Negotiated
Trouble Maintenance
Once a service is up and running, there is a 100% chance that at some time
the service or the customer will have a problem . Mean time to repair or resolve
(MTTR) is the grandfather of every SLA KPI . When something goes wrong, how
long does it take to fix it? The more critical the service or application, the shorter
the MTTR requirement from the enterprise .
Performance Management and Key Performance Indicators
Performance management (PM) covers a broad range of KPIs . All SLAs have a
provision that mandates 24x7 monitoring combined with a 24x7 support line
to log service outages and troubles . This requirement is in stark contrast to
residential service, and even the level of support required by small business
services .
The ability to offer always-on performance monitoring opens the door to a
premium level of service for financial and international businesses, which rely
on quality network performance including, always-on availability with low loss,
latency, and jitter performance . Again, MSOs must make major changes to meet
this requirement .
Translating SLAs into Operational Requirements
SPIRENT WHITE PAPER • 9
Table 5. Penalties and Credits focused SLA requirements
CRITERIA SLA 1 SLA 2 SLA 3 SLA 4
Activation Penalty 1/30 of MRC
per day
5% of MRC
per day
Negotiated 5% of MRC
per day
Loss Penalty 1/30 of MRC
per 1%
Negotiated N/A 1/30 of MRC
per %
Latency Penalty 1/30 of MRC
per 1 ms
Negotiated N/A 1/30 of MRC
per 1 ms
Availability Penalty 1/30 MRC
per hour
<4 hour = 5%,
up to 50%
1 hr = 2%
5% after first hour
2% of MRC
per hour
Bandwidth Penalty Negotiate 1/30 of MRC
per day <CIR
1/30 of MRC
per day
Negotiated
Moving into the Big Leagues with an SLA
An MSO focused on residential and SMB services will have to make changes to
meet the requirements of medium-to-large business services, which require:
• Time sensitive, rigid, penalty-based, 24 x 7 service management,
monitoring and field dispatch
• Coordination of technician dispatch, historical trending of service issues,
and reporting to the customer for multiple service endpoints
• Increased visibility into the network in both wholesale and retail
service markets
• The ability to segment service to assign responsibility for repairs
• The ability to span multiple systems across MSO and Telco providers
Penalties and Credits
SLAs offer assurances to the customer of a defined level of performance and
compensation if the service level is not met . Penalties and credits are the
teeth of the agreement, poised to bite the provider when the service delivered
falls below the guarantees . An SLA can define blanket penalties for service
disruption or penalties for specific SLA violations . Additionally, penalties are
used as competitive differentiators . Customers look to the credits and penalties
as a measure of the confidence the service provider has in its product . The
higher or more stringent the penalty or credit, the greater the likelihood that the
provider will have very few service disruptions .
Translating SLAs into Operational Requirements
10 • SPIRENT WHITE PAPER
Table 6. Example CoS-based Metro Ethernet SLA from the MEF
CLASS CHARACTERISTICS CoS ID
BANDWIDTH
PROFILE PER EVC
PER CoS ID
SERVICE
PERFORMANCE
Premium Real-time IP telephony or
IP video applications
6, 7 CIR > 0
EIR = 0
Delay < 5ms
Jitter <1ms
Loss <0.001%
Silver Bursty mission-critical
data applications
requiring low loss
and delay
4, 5 CIR > 0
EIR <=UNI Speed
Delay <5ms
Jitter <N/S
Loss <0.01%
Bronze Bursty data applications
requiring bandwidth
assurances
3, 4` CIR > 0
EIR <= UNI Speed
Delay <15ms
Jitter <N/S
Loss <0.1%
Standard Best Effort Service 0, 1, 2 CIR > 0
EIR <= UNI Speed
Delay <30ms
Jitter <N/S
Loss <0.5%
Traffic Diversity and Class of Service Requirements
Class of service (CoS), the assignment of service performance levels within an
SLA, places an even greater demand on operations . Application and service
specific quality of service (QoS) levels dictate CoS capabilities to match the QoS
requirements .
In each of the sample SLAs, there is one set of key performance indicators –
loss and latency . As a result, these SLAs only provide guarantees on a single
class of service . Traffic diversity introduces the need for multiple classes of
service . SLAs must support graduated classes of service . Providers who can
respond with these levels of guarantees will remain competitive in the always-
on, always connected world .
Standard bodies such as the International Telecommunications Union (ITU)
and the Metro Ethernet Forum (MEF) are at the forefront of defining KPI values
for multiple classes of service for known applications . Vanilla, one-size-fits-all
performance metrics no longer meet the needs of subscribers .
Translating SLAs into Operational Requirements
SPIRENT WHITE PAPER • 11
TRANSLATING SLAs INTO OPERATIONAL REQUIREMENTS
For an MSO to deliver service to an SLA, the guarantees of the SLA must be
translated into specific operational practices . By organizing SLA requirements
into the three main categories of turn up, PM, and maintenance, MSOs can
develop best practices and achieve efficiency and service quality .
Turn Up
An SLA specifies turn up times, but turn up is more than just activating a
link or service . Turn up testing validates that the service meets the quality of
service specified in the SLA from the beginning . No longer is best effort service
sufficient . Validation based on whether the “green light” is on, or whether
packets are sent and received is not acceptable . Furthermore, documentation is
required (e .g ., a birth certificate) that the installed service meets the SLA .
For example, SLA 2 in Table 2 has an installation interval declaration (the time
from ordering a circuit to its availability) of 45 days for on-network locations,
leaving the installation interval negotiable for off-network locations .
Services
There are generally two focus areas in the Ethernet service portfolio for a
wireline service provider:
1 . Wholesale service for delivering Ethernet connections as part of an
end-to-end service for another service provider
2 . Retail service for delivering business-class service to individual
enterprises
Examples of wholesale service include mobile backhaul connections for wireless
service providers from cell towers to the core packet network, or one service
provider buying connections from another provider to office locations of an
enterprise where they cannot provide the total service themselves .
Most of the SLA examples in Table 1 focus on retail service . The provider has
complete ownership of the network and generally turns up a small number of
office locations (1–10) . In the case of a wholesale service, one order may include
a large number of connections (hundreds) that must be turned up .
Some SLAs state an exact turn up timetable per location, while others specify a
timetable per service . For example, a private line connection will typically state
an exact date, while a mobile backhaul service might be lumped into groups of
sites prioritized by the customer .
Translating SLAs into Operational Requirements
12 • SPIRENT WHITE PAPER
Figure 5. Today’s typical approach for service turn-up is field technician based
The Current Turn Up Process
Today’s service providers use field technicians at both ends of the service and in
the network offices to turn up and validate the service as show in Figure 5 . Key
to this strategy is:
• Coordinated field dispatch to offices or cell towers to confirm the service
is functional
• Validate Quality of Service to match the SLA
The Achilles’ Heel of this approach is that scaling the service is directly
proportional to how many field technicians can be hired, retained, and properly
coordinated . As service providers scale their service, they begin to realize that
this approach does not scale; can’t coordinate, can’t hire, and can’t support the
time issues induced by a service validation failure . Can the install be completed
on time to avoid the penalty?
Translating SLAs into Operational Requirements
SPIRENT WHITE PAPER • 13
Service Validation
While construction, equipment allocation and installation have the lion’s share
of impact on turn up, once the construction is completed a separate dispatch,
usually requiring multiple resources, is required to validate the new service . This
step adds time and expense and requires documentation, for example, the birth
certificate establishing that the service meets the SLA at the time up turn up .
Field handheld tools are not capable of producing this documentation .
Streamlining the turn up process to reduce the impact to field service teams has
traditionally been an operational challenge . However, some providers are finding
a way to reduce the cost and time required for turn up through a model that
scales while providing SLA compliance through a remote probe-based solution .
In 2010, the MEF named Verizon the Global Service Provider of the Year for its
innovation in developing and delivering robust wholesale service offerings that
extend the reach of Ethernet services, including mobile backhaul and business
Ethernet services . They also awarded AT&T the Best Business Ethernet Service
Award for providing the most innovative business Ethernet service
for enterprise .
Optimizing Turn Up
Cost-effective and efficient tactics are a must to maintain a competitive
advantage and reduce the impact of accelerated turn up schedules on field
technicians . Remote testing at the time of installation minimizes post
turn-up dispatches .
In mobile backhaul, contracts are contingent on service turn up intervals . It is
not unheard of for a wireless carrier to turn up hundreds of sites a month and
many rely on a wireline service provider to furnish the tower backhaul . These
aggressive turn up schedules are passed along to the provider that can be a
competitive factor, particularly when it comes to the turn up intervals .
Typically, during turn up two field technicians are dispatched -one to the tower
and the other to the provider edge- and both are equipped with handheld
testers to run an RFC 2544 test on each class of service at each frame size .
This approach as shown in Figure 5 and requires a coordinated effort with two
technicians who must be available at the same time . Many service providers are
moving toward an alternative approach, which includes:
• Installing remote controlled service assurance test probes in the network
to enable testing remotely from the network operations center (NOC)
without a dispatch to the network
• Installing service assurance network-terminating equipment such as an
ALU 7210, Cisco 2941, or other network interface device, which can be
controlled from the NOC and providing a loopback function
Translating SLAs into Operational Requirements
14 • SPIRENT WHITE PAPER
The combination of this solution shown in Figure 6 allows connectivity testing
and validation of the QoS to meet the SLA requirements without coordinated
dispatches to all end points of the service, as shown in Figure 5 . This
automated approach saves four to five hours of drive time, set up time, and
test time . By centralizing and automating, an MSO can significantly reduce the
effort and time it takes to run the test, capture results, and store them for SLA
reporting requirements .
Figure 6. Optimizing service activation via remote controlled Service Assurance devices
NTE with
Loopback
NTE with
Loopback
NTE with
Loopback
NTE with
Loopback
NTE with
Loopback
NTE with
Loopback
NTE with
Loopback
NTE with
Loopback
Service Assurance Probes
Up to 80 percent of the issues with service turn up are related to network
configuration problems, such as MTU size, CIR/EIR set incorrectly, specifying
the wrong Ethernet profiles to support single or multiple classes of service,
and similar issues . During each turn up it is critical to individually validate the
configuration of each of the classes of service . But, testing each CoS individually
is not enough . They must also be validated while running simultaneously
to determine the impact of services on each other . The ITU has recently
recommended a test strategy that does this called Y .1564 . ITU-T Y .1564 is a
two-step test:
1 . A service configuration test, which validates the network configuration
of each defined service (rate limit, traffic shaping and QoS) . It tests such
details as MTU, bandwidth (CIR), CoS prioritization .
2 . A service performance test, which validates the QoS of each defined
service and proves SLA conformance . It reports such metrics as frame
delay, frame delay variation, frame loss ratio, and availability .
Translating SLAs into Operational Requirements
15 • SPIRENT WHITE PAPER
Figure 7 below shows an example of a RFC 2544 test run on a single class of
service, and the ITU-T Y .1564 service performance test with four individual
classes of service streams run in parallel, reporting the integrated QoS .
Figure 7. Current RFC 2544 test and the new ITU Y.1564 service performance test examples
RFC 2544 Test
ITU Y.1564 New Standard
Unlike the typical residential service that doesn’t provide SLA guarantees, large
business customers require proof of conformance documentation to verify
that they are receiving the service for which they contracted . Supplying this
proof requires a system that maintains a history of service issues and calls and
supports the ability to document the issues and their resolution . This ability
is especially important for PM and maintenance, but it begins with a birth
certificate record showing the date, time, and performance of the service at
turn up .
Translating SLAs into Operational Requirements
SPIRENT WHITE PAPER • 16
Figure 8. Optimizing service activation via remote controlled Service Assurance devices
NTE with
Loopback
NTE with
Loopback
NTE with
Loopback
NTE with
Loopback
NTE with
Loopback
NTE with
Loopback
NTE with
Loopback
NTE with
Loopback
Service Assurance Probes
The ability to ramp-up these Ethernet connections for mobile backhaul or
business services correctly the first time is a challenge that requires the MSO
to find a solution that is scalable and sustainable . Some organizations have
responded by throwing people at the problem . But how many more people can
be hired? How can management effectively coordinate them for dispatches to
dozens of locations for thousands of circuits? How can they be educated and
retained? In reality, it becomes a business problem—how can the service be
grown, thus revenues, faster than the staff are grown to support it? The answer;
remote controlled devices to enable service activation without the dispatch and
without waiting .
Translating SLAs into Operational Requirements
17 • SPIRENT WHITE PAPER
PERFORMANCE MANAGEMENT
As shown in Table 3, there are specific requirements for network availability,
packet loss, and latency, and resulting penalties if the provider does not meet
them . How does a service provider ensure that the quality being provided to
their enterprise customers or service provider business partners is correct? What
happens if the customer or business partner insists that the quality does not
meet the SLA?
Unlike TDM services in which the network elements gather statistics on quality
and availability, in an Ethernet service the network elements provide minimum
visibility to the quality of the service . Some devices may track raw frame counts
or perhaps monitor a heartbeat between endpoints, but minimum visibility to
the quality of the service is available, and that only at the most basic level .
There is no native tracking of QoS for one level of service, much less several .
Because Ethernet devices do not track this information, additional tools are
required to fill the gap .
Tracking KPIs
Delivering on an SLA requires confirming 24 hours a day, 7 days a week that
specific, contracted KPI targets are being met continuously . Performance
matters and is quickly becoming the differentiator for competitive Ethernet
services . The offered SLA can set an MSO apart, but it must be backed up, a
fundamental change from residential service that affects operations . The focus
has shifted from network quality of service (QoS) to service quality or Quality of
Experience (QoE) .
Network Availability
How would a service provider know, before the customer calls, that the network
has become unavailable? For Ethernet, an outage is the duration of time when
a service cannot send or receive packets at the various frame/packet sizes . In
the past, SLAs specified only the ability to send or receive packets, but today’s
applications have moved far beyond best effort service .
SLA KPIs are evolving as availability is applied to the multiple CoS levels
dictated by the customer applications, requiring even more visibility into
performance for each CoS . This visibility requires PM tools that can be
customized and track specific service requirements across the system .
Packet Loss
How does an MSO confirm that traffic transits the network without dropping
packets? Has a traffic misconfiguration reduced the CIR the service can
transport, or is some portion of the network now experiencing degradation,
causing frames to be damaged and thus dropped by the receiving switch? Some
applications, such as web applications, perform well in the presence of frame
loss, where other applications, such as video conferencing or live medical
videos, are more sensitive to loss .
Translating SLAs into Operational Requirements
SPIRENT WHITE PAPER • 18
Latency
How does an operator confirm that traffic transits the network without
exceeding the latency requirement? Is congestion in some portion of the
network causing packets to be queued and thus latency to increase? Financial
transactions are particularly affected by even a few milliseconds of latency and
can have a huge monetary impact on customers .
Active Monitoring
Passive monitoring, which tracks KPIs based on monitoring customer traffic,
places the operator at a significant disadvantage when SLA violations occur,
because by the time the operator discovers that an issue exists using this
technique, the customer is already experiencing the degradation, firsthand .
Moreover, network problems that surface during times where there is no
customer traffic on the network remain undetected . Passive monitoring does not
provide an assessment of delay or jitter performance and loss is merely inferred
from frame counters instead of measured directly .
To assure SLA compliance, an MSO must measure KPIs 24 x 7 per class of
service through active monitoring, which generates test traffic designed
specifically to report on KPIs across the network .
Performance Monitoring is not a field service tech activity . Handheld testers
are not capable of active monitoring because they provide only a snapshot of
the state of the network, not a continuous monitoring with historical archives .
Besides functionality issues, the multiple coordinated dispatches required to
test a service with a handheld increases OpEx and reduces MTTR . In addition,
provider edge network elements are not optimized to aggregate KPIs per class of
service per EVC .
Translating SLAs into Operational Requirements
19 • SPIRENT WHITE PAPER
Figure 10. Example of performance monitoring crossing thresholds
PM as a Management Tool
Active PM allows an MSO to manage the network, proactively addressing
issues before it results in SLA violations . Setting the thresholds and alerting
operations of declining performance before it impairs customer traffic can result
in significant
savings by
avoiding penalties
and credits .
A threshold
violation can
trigger an alarm,
requiring a
technician to
run further tests
or to alert the
subscriber of the
situation .
To perform the active monitoring required to avoid SLA violations and penalties,
the NOC must depend on a 24 x 7, always-on tool that can view the history of
the quality of the service end to end via real-time, active testing is required .
Figure 9. Using Active Traffic Generation to measure QoS 24 x 7 via Service Assurance probes
NTE
NTE
NTE
NTE
NTE
NTE
NTE
NTE
Service Assurance Probes
Low bandwidth traffic
Translating SLAs into Operational Requirements
SPIRENT WHITE PAPER • 20
PM as Triage
Active PM can also save OpEx by allowing an MSO to triage performance
degradation before dispatching a field technician . For example, suppose a
customer has a problem sending traffic from office A to office B . Many service
providers would do a coordinated dispatch, sending a field technician to both
ends to run tests . With an active PM tool, a single technician at the NOC can
instead use segmentation to locate the problem, verify that it still exists, and
direct the dispatch to the needed office with the knowledge, parts and tools to
solve the problem in a single visit, all with historic data and a clear action plan .
Historical information can highlight repeats and chronic outages that may point
to a specific source for the outage . Performance monitoring pulls double duty,
both as a monitoring tool and as an active testing tool .
Maintenance
Each of the SLAs in Table 1 has a requirement for a 24 hour, 7 day a week
call center to support trouble calls and to repair the service within the MTTR
threshold . The residential model for working on service troubles between
8:00 a .m . and 5:00 p .m . is not viable, and certainly isn’t competitive . Even with
people on call 24 x 7, a technician must respond within the window to avoid the
penalty . The nature of the customer’s business depends on service availability
and could mean lost revenue . Meeting MTTR metrics increases the pressure
and the stakes . In addition to proactive monitoring, a process that assures SLA
compliance is required to address reactive maintenance . The MTTR is running,
even at 3:00 a .m . Where are the field technicians sent? How many are needed?
Figure 11 shows the typical business process for a major outage for a business
customer—send field technicians to find the problem .
Figure 11. Dispatch field technicians to find the fault
Translating SLAs into Operational Requirements
21 • SPIRENT WHITE PAPER
A service assurance solution allows the MSO to perform a rapid remote test from
the NOC, segment the link, and identify trouble before dispatching a technician .
With the right tools and processes in place, in many cases t problems can be
resolved without ever dispatching a field technician . Hard failures, in particular,
are easy to discover and resolve . Intermittent problems are the source of the
time drain—random symptoms, complaints of slow service . The 80/20 rule
applies—most issues are addressed rapidly, leaving the technicians available
to address the 20 percent that require more time . Operationally, if a technician
must be dispatched to fix the problem it is critical that the technician is sent to
the correct location the first time .
In the SLA environment, every customer call, regardless of how trivial it may
seem, must be addressed with documentable action . Achieving that goal
requires a solution that provides:
• A remote controlled test and diagnostic capability to rapidly identify
troubles on the enterprise service
• The ability to isolate troubles rapidly across the network, thus
dispatching a technician to fix a problem instead of being dispatched to
find the problem
• The increase in the aggregate expertise level of technicians on duty
through automated scripts and connectors, allowing even junior
technicians to troubleshoot effectively
• Archiving historical records and documentation of network health, test
results, previous repair efforts, and PM results for analysis
Translating SLAs into Operational Requirements
SPIRENT WHITE PAPER • 22
Figure 12. Remote controlled service assurance probes working with the network elements replace field
technicians for trouble shooting
NTE with
Loopback
NTE with
Loopback
NTE with
Loopback
NTE with
Loopback
NTE with
Loopback
NTE with
Loopback
Service Assurance Probes
Field tech dispatched
Figure 12 below shows how remote controlled service assurance probes working
with the network elements in the network and at the enterprise can identify
and sectionalize the problem . Therefore, the service provider must send a field
technician to the right place to fix the problem, only one instance, not to find it .
Landing high-revenue customers is a high stakes business . With any network
or service, the emergence of problems is inevitable . The competitive nature of
maintenance favors the provider that can resolve the problem quickly to avoid or
minimize disruption to the subscriber .
Translating SLAs into Operational Requirements
23 • SPIRENT WHITE PAPER
Forward-looking MSOs will develop the ability to monitor managed services and
specific applications and provide tiered services . SLAs that provide QoS metrics
for each CoS are required . As more enterprises move their systems to cloud-
based applications to reduce their IT costs, more stress will be placed on the
service provider to deliver a higher quality network .
For the MSO to win in the evolving business service market that the Telco service
providers have traditionally dominated, it is critical for them to offer strongly
competitive service level agreements . MSO providers must deploy a service
assurance strategy focused on remote probes for service turn-up, performance
monitoring, and troubleshooting, and optimize the use of field technicians to
reduce operational costs and deliver SLA compliance . Using this strategy, MSOs
can gain market share, retain customers, and exceed revenue expectations in
the Ethernet market .
CONCLUSION
Ethernet has opened up several new markets to MSOs that have traditionally
been the domain of traditional Telco providers . The opportunities are big, the
stakes are high, and those who succeed will learn to navigate the competitive
world of Ethernet SLAs .
This requires the MSO to optimize the process of managing service to KPI targets
by translating SLA requirements into operational actions as shown in Figure 13 .
Figure 13. Win at the competitive world of Service Level Agreements
• Probes in the network act like technicians
• NTE devices with assurance loopback
• Avoid dispatches, turn up right … the first time
PERFORMANCE
MONITORING
• Active traffic to measure KPIs
• 24x7 “always on” quality
• Deep visibility to SLA-based KPIs
TROUBLE
MANAGEMENT
• Remote test probes
• Identify and sectionalize
• Dispatch to fix—not find
SERVICE
ACTIVATION