NFV management and orchestration (MANO) systems are being developed to meet the agile and flexible management requirements of virtualized network services in 5G Networks. Regarding this, ETSI ISG NFV has specified a standard NFV MANO system that is being used as a reference by MANO system vendors as well as open-source MANO projects.
Why MANO KPIs are so important?
In the absence of MANO KPIs, it is difficult for Mobile operators to make a decision which MANO system better suited to meet their needs. Without KPIs, the performance of a MANO system can be quantified, bench-marked and compared, here Mobile operators are only left with choice of simply comparing the claimed feature set by each vendor. So these KPIs becomes important to analyze and compare the performance of MANO systems and identify the performance gaps.
The MANO layer is shown in below:
MANO KPI Bench-marking Challenges:
In traditional network, Network Management is based on FCAPS management that uses NMS to monitor, measure and enforce the KPIs of the networks and services that they are managing. Some of the key KPIs that an NMS monitors are:
- Availability – To ensure that the network elements and services are available.
- Utilization – To ensure that network resources are utilized to their maximum
- Service Level Agreements (SLA) – to ensure that users and services do not exceed their respective utilization of resources beyond what is stipulated.
- Latency – to ensure that the services are delivered within the specified delay budget to maintain Quality of Service (QoS) and Quality of Experience (QoE).
- Jitter – To ensure that the packet inter-arrival time does not deviate from the mean delay beyond a specified
- Errors/Warraning – To ensure the maintenance of end-to-end service integrity against errors.
There are many other measurable parameters that fall within the above mentioned categories of KPIs that an NMS monitors and enforces by taking appropriate measures. The above mentioned KPIs are well-known, well defined and they also cover the performance measurements of the NFVI compute and network resources as well as the VNFs. For instance, the performance of a virtualized router function can be benchmarked using the traditional KPIs defined for the traditional routers.
The challenge however is to define the KPIs for benchmarking the performance of the NFV MANO system itself. This
becomes all the more important because the NFV MANO system, in addition to performing the traditional FCAPS
management, provides Life Cycle Management (LCM) of VNFs and Network Services (NSs). Having KPIs to quantify the performance of MANO systems is important owing to the highly agile, flexible and dynamic nature of the virtualized services being delivered by the VNFs/NSs where the reaction time of the NFV MANO system from monitoring an event to deriving an appropriate LCM action and executing it becomes critical in view of stringent performance requirements of different verticals sharing the same NFVI resources. A benchmarked MANO system will thus provide the customer to choose an appropriate MANO solution that best fits its operational needs.
NFV MANO Performance KPIs [Functional and Operational]
MANO system performance KPIs can be classified into two categories i.e. Functional KPIs and Operational KPIs. Functional KPIs describe non-run-time characteristics of a MANO system. These include:
- Resource footprint of the deployed MANO system.
- Variety of VIM platforms a MANO system can support.
- Number of VIMs a single MANO platform can manage efficiently.
- The maximum number of VNFs a MANO system can effectively/efficiently monitor and manage in an NFVI.
- Feature palette. For example, support for DevOps, VNF Package Management, VNF image management, integrated monitoring system, etc.
Operational KPIs characterize the run-time operations. This is mainly quantified by measuring the time-latency of a
particular Life Cycle Management procedure/task and its effectiveness. Below picture depicted the different operation can be performed at MANO layer and summarized further.
On-boarding Process Delay (OPD): It is the time that it takes to boot-up a virtualized network function image i.e., a VM with all its resources. Once it is booted, the VM can be used to host and run a VNF service. This is similar to the service deployment time defined in the prerequisite before on-boarding a VM is the creation of a VNF software image in a format that is recognized by the MANO system, and the package uploaded to some repository.
This package not only contains the VNF software but also the VNF descriptor file (VNFD) that specifies all the configuration information, network requirements, resource requirement, routing/security policy, IP ranges, performance requirement, interfaces etc. A Network Service Descriptor (NSD) is also on-boarded.
A NSD is a template that describes the requirements of a Network Service in terms of function, operation, security, links, QoS, QoE, reliability and connectivity. It also includes the VNF Forwarding Graph (VNFFG) that identifies the VNF types and the order of their connectivity and the characteristics of the Virtual Links (VLs) interconnecting the constituent VNFs to create a Network Service (NS). On-boarding Process Delay (OPD) is dependent on the service resource requirements specified inside the VNFD.
Deployment Process Delay (DPD): Itis the time taken to deploy and instantiate a VNF within the booted VM and setup an operational Network Service. In this process, a service instance is instantiated by parsing the NSD
and the VNFD files. All the VNFs that are part of the NS are instantiated based on their respective onboarded images. The MANO system will ensure the provision of required resources for instantiating the VNFs and linking them via relevant VLs in case of a complex NS, and then configure each VNF based on the configuration information provided in the respective NSDs and VNFDs. The speed at which a VNF or a NS is deployed is crucial when a VNF/NS has to be scaled to meet sudden increase in load demands.
Run-time Orchestration Delay (ROD): The run-time orchestration operations consist of different management procedures, and the performance latency of each individual action can be quantified by measuring the time difference from the moment the action is executed to the time the action is completed.
For example, a MANO system that can complete a scale-out or a migration operation of a heavily loaded VNF with minimum service disruption can be deemed to have good performance. ROD is dependent on a monitoring system that continuously monitors active VNF/NS instances throughout their lifetime for any performance deviation or fault event. Thus a MANO system that performs low-latency run-time operations with minimum monitoring load can be considered as performant.
Quality-of-Decision (QoD): QoD is another metric to quantifies the performance of a MANO system in terms of its effectiveness in carrying out runtime Life Cycle Management operations of VNF scaling and migration. QoD is measured in terms of the following criteria:
- Efficiency of a resource management decision: The resource efficiency is measured in terms of:
i) Whether long-term and short-term resource requirements of the managed VNF will be fulfilled in the selected compute node.
ii) How non-intrusive a management action has been for other VNFs that are already provisioned in the selected compute node. That is, to what extent will the managed VNF VM affect the performance of other VNFs in the selected compute node in terms of resource availability.
- Number of times a management action has to be executed before the most-suitable compute node is identified to migrate/scale the managed VNF.
- The timeliness of the computation and execution of MANO LCM actions.
- Research Paper: On the Challenges and KPIs for Benchmarking Open-Source NFV MANO Systems: OSM vs ONAP by Girma M. Yilma, Faqir Zarrar Yousaf, Vincenzo Sciancalepore, Xavier Costa-Perez