A bit of History
If we look back, hasn´t been so long ago since the operation and maintenance (O&M) of big networks, such as those of a communications service provider (CSP), was done in a monolithic way. Initially, network problems were detected when it was too late. The current operator guessed which systems were related to the problem and went one by one analyzing the logs of each of the elements until he found the origin of the problem. This made the operator’s job extremely difficult and drastically increased the MTTR (Mean Time to Repair).
Afterwards, EMS (Element Management System) and NMS (Network Management System) appeared. The EMS, specific of each manufacturer, oversee managing a lot of equipment related to technology, they oversee gathering alarms, stock, performance data and configuration information, and the provision of network elements they operate. NMS, commonly called monitoring tools, provide both O&M and Planning and Optimization with a segmented view, since NMS do not perform all the functions of the FCAPS (Fault, Configuration, Account, Performance, Security) framework, thus that we find a wide variety of tools, each one focused on a specific function.
With the arrival of technologies such as Mobile Access, Mobile Core, SDH, PON, FTTH, Packet Microwave, IP/MPLS, etc. the number of EMS grew. Given the great variety of NMS and EMS existing in the market, each one dedicated or specialized in a certain technology or network layer, “umbrella” tools appeared, in charge of integrating with the rest of NMS, EMS and centralizing the information in a single dashboard to facilitate the operator’s work and provide a service vision. Allowing the interrelation between the different network layers and the ability to relate a final problem to the origin (RCA or Root Cause Analysis), even when they are elements of different technologies. This provided great visibility to the CSPs, being able to reduce the MTTR in the event of network failures and falls, improving the user´s experience, minimizing the well-known “churn” and indirectly increasing profits. In addition to that, some manufacturers of monitoring tools began to incorporate mechanisms to trigger automation on the equipment after the appearance of an alarm, allowing the problem to be solved automatically and reducing the MTTR.
The following diagram helps us understand how the different layers interact in the provision and quality of service:
We can see operations support systems (OSS) as the layer that provides service vision to the set of EMS and NMS elements. Above all, the business systems support layer (BSS), from which the different orders are managed. The provision (Fulfillment) is carried out through the BSS systems and passes through each of the layers until it reaches the network elements (NE or Network Elements). The other way round, the quality of the service (Assurance) is ensured, starting with the monitoring of network resources, and going up each of the layers until obtaining a service vision. The picture refers to the different layers defined by the International Telecommunication Union (ITU-T) in the TMN (Telecom Management Network) model in parentheses: BML (Business Management Layer), SML (Service Management Layer), NML (Network Management Layer), EML (Element Management Layer) and NEL (Network Element Layer).
Going back to the evolution of networks, automation with container environments, SDN networks (Software Defined Networks) and the virtualization of network functions (NFV, Network Function Virtualization) are becoming increasingly important, both in architecture and in the deployment automation. This is an important part of the next step, which is trying to get ahead of a potential network congestion or problem situation and solve it automatically through orchestration.
The future
OSS in the Cloud?
According to a TM Forum study, looks like that for the moment the future of OSS will not be entirely in the cloud. Many CSPs want to keep network operations in their own data centers (on-premises) or in their private cloud due to concerns about latency and security provided by applications running in public clouds. In other words, the information contained in OSS is so critical that companies believe they must keep control of the HW on which the SW runs.
Picture II. Survey on the migration from OSS to the cloud in CSPs
In the picture, you can see how 60% would not move or would only move a part of their OSS systems to the cloud and 28% are uncertain, while the rest have got it clear.
The union between BSS and OSS
Though we see CSPs wanting to transform operations support systems (OSS) into data and automation-driven, these transformations often impact customer-oriented BSS processes.
For example, in a data-driven environment, OSS systems that can identify service degradation must be connected to customer databases to identify those impacted by such degradation. This union, in many cases, is not being carried out at the level it should be.
CSPs will have to deliver and secure, in real time, the services they provide their big account clients consequently they will expect to have real-time visibility of those services. Similarly, a real-time snapshot of the network also allows the CSP sales team to price and design services for clients.
Eventually, customer focusing is a necessary strategy, but it cannot be done without having better integration between the OSS and BSS systems.
Automation
The operators, in addition to needing flexible, open and low-cost solutions, also want them to support automation, so that the so-called Close-Loop Fulfillment, Close-Loop Assurance and Close-Loop Optimization processes are achieved.
Initially, these processes could lead to the elimination of the jobs of some operators, but in the long run and with the arrival of 5G, we will see how this support is necessary to keep up with the pace, speed and volume of these networks.
Close-Loop processes will be necessary for the deployment of 5G services for big account clients, which will be carried out through network slicing, in which E2E (End-to-end) services are deployed in an orchestrated manner for customers that impact the provision of the elements that make up the Mobile Access (RAN), Transport, and Core networks through network functions virtualization (NFV).
AI applied to IT operations
During the last years we have seen how CSPs have been investing in data analysis and now they want to go a step further by adding artificial intelligence to improve user experience, increase agility, improve efficiency and reliability.
So far, network monitoring as a whole was focused on providing visibility, but after decades of improvements in the field of artificial intelligence, we are faced with the challenge of choosing and applying the different machine learning algorithms on the OSS data and be able to detect potential failures before they happen. To do this, these systems must be integrated with different sources of information: meteorological, alarms, KPIs, logs, incidents registered in ticketing tools, etc. and train the algorithms with this data to later detect anomalies.
The union of the three worlds: BSS/OSS, artificial intelligence and automation gives rise to the AIOps (Artificial Intelligence for IT Operations) framework.
What is AIOps?
AIOps is the application of artificial intelligence to IT operations, using analytics and machine learning on large volumes of data collected by tools and devices, for automatic detection and reaction in real time.
According to Gartner, AIOps consists of two main components, Big Data and AI and requires a correlation of the OSS data together with BSS data (CRM or Trouble Ticketing). Putting all this together, the aim is to detect anomalies and act accordingly through automation. You can think of AIOps as Continuous Integration and Deployment (CI/CD) for core IT functions.
Picture III. Gartner´s Visualization of the AIOPS Platform
AIOps unites three different IT disciplines to achieve continuous improvement goals: service management, performance management, and automation.
What forces are leading AIOps?
AI is intended to improve, to speed up and “scale” the work done by a person. The AIOps approach focuses on the following points when addressing the agility, scale, and complexity challenges of digital transformation:
- The difficulty that IT operations have to manually manage their infrastructure.
- The amount of data that IT operations need to retain is increasing exponentially.
- The response time to infrastructure problems should be less and less.
- More computing power is moving to the edges of the network (edge computing).
The layers that make up AIOps
AIOps consists of the following layers, shown in the following picture:
Picture IV. Las capas que conforman una plataforma AIOps
- Wide and diverse sources of IT data (events, metrics, logs, scheduled jobs, tickets, monitoring, etc.).
- Modern Big Data platforms that allow real-time processing of IT data.
- The application of rules and detection of anomalies on the stored data.
- Domain algorithms that take advantage of experience in the IT environment.
- Automatic learning based on own output and new data introduced in the system.
- Artificial intelligence that can adapt to new and unknown patterns in an environment.
- Automation, that uses the results generated by machine learning or AI to automatically create and apply a response or improvement to identified problems and situations.
The adoption of artificial intelligence in AIOps is emerging compared to machine learning. Right now, the most urgent cases are best approached with simple automation or a combination of ML and automation. It remains to be seen how AI will evolve and what new cases will allow. In any case, a solid foundation of AIOps must be established in IT Operations, as it now exists, before we can begin to model human behavior for its use.
Where are we?
Satec has been working ina the world of OSS for many years now, that is why we have acquiree a great deal of experience, both in terms of customer needs and in market-specific tools. All this, together with our application development capabilities, especially with tools in the world of data processing and management (Big Data, AI/ML) has allowed us to build a proposal with our own solutions in the OSS systems of new generation area.
Our solutions can integrate any type of source, making use of a set of powerful and modern pieces focused on data management and storage, which in an integrated way offer great flexibility and advanced functionalities. On top of all this we add data analytics capacity and machine learning, to allow, for example, the detection of faults (both for supervised and non-supervised models), or the release of tools to solve problems detected automatically, with the result of the so-called Close-Loop process.
Finally, for those of us who have been working in the BSS / OSS area for several years, we have an opportunity to renew ourselves and increase our knowledge. Until now our role was to provide visibility, but with the arrival of AIOps, we will be able to apply machine learning algorithms together with artificial intelligence and in response, start automations for both automatic deployment and troubleshooting. Concepts that until now were out of our scope, but now take on a lot of importance. After all, who doesn’t dream of having their own Skynet.