Network Management Blog - by StableNet^®
Regular posts on all things around automated network & service management

Outside Contribution

Root Cause Analysis with StableNet^®: Why it makes a difference in selecting your network management tool

November 23^rd 2022, Würzburg

Forward: Independent of the context of network management, Root Cause Analysis (or simply RCA) is quite simply a systematic process to identify the causes and origins of a failure. It helps to understand the central cause and identify the best solutions, helping you remove the root problems and extend the life cycle of systems and processes. The main purpose should be to identify core faults or issues and to help in developing scalable, innovative solutions. RCA embodies what is known as systematic approach in which the whole is seen as the sum of its various parts. The goal is system optimization. RCA is conducted using a set of different principles, techniques, and methods.

Focus on addressing the root cause of the problem, not just fixing its effects;
Acknowledge that there can often be multiple root causes;
Remember that the main thing is how and why the situation happened, and not who is responsible for it;
Remain consistent. If you have found the root cause of the problem, look for evidence of causation that will support your conclusion;
Collect as much information as possible to make it easier to create a plan of action aimed at correcting this problem;
Think of and implement a strategy to avoid similar mistakes in the future.

As practice shows, when analyzing the root cause of a problem, it is necessary to apply a holistic (and systematic) approach. In addition, one must strive to collect all helpful information and form a context that will lead to the final solution to the problem. I hope the following external contribution brings new light and illumination to a topic that is very much at the core of the StableNet^® value proposition. Happy Reading, Dr. David Toumajian, Director of Marketing Infosim GmbH & Co., KG

Reasons why you should do root cause analysis

Root cause analysis is used to achieve different goals. One of the goals of root cause analysis is to identify the root cause of a problem or event in a network. The second is to understand how to correct or compensate for any difficulty that arises. And the third is to apply all the knowledge gained from the analysis. It is particularly significant because the best error-cause analysis is useless if no appropriate countermeasures are taken afterward.

For most companies, the third goal is the main one. RCA can be used to introduce changes to core processes, thus avoiding the same problems from reoccurring. When the third goal is achieved, there are numerous advantages for a company: Processes in the company are (again) running smoothly. It saves resources and has a positive effect on profitability and company growth.

If, for example, a network is not running correctly, this can be due to some errors in the settings. You can then adapt network troubleshooting, which identifies the problems and then solves those problems using different methods.

Correcting individual manifestations of the consequences may seem like a fairly productive option. But if you do not find the actual reason for what is happening, most likely you will face the same problem again and again.

RCA and Network Management: Choosing the right platform

StableNet^®provides a 4-1 solution that will change the way you look at managing networks and services. It is a network fault management tool to quickly resolve network issues by identifying the root cause. It consists of an up-to-date monitoring system together with a highly effective and automatic RCA system, which identifies the cause of the issues, each affected appliance and field and allowing efficient troubleshooting of the network. StableNet^®allows the network team to quickly monitor and isolate network outages and further resolve network troubleshooting. Moreover it automatically generates the main cause for detected physical networks. The alert shows the component that caused the issues, with the ability to navigate directly to the correct device that caused the problem or performance failure.

Features:

Automate Root Cause & Impact Analysis
Service Assurance
Dynamic Rule Generation

Compared to other Monitoring and Management Systems, StableNet^® combines four solutions (functional modules) in one platform with a single GUI to manage your entire diverse network:

failure management – failure control and event correlation (Fault Management)
performance management
configuration management
inventory and automatic topology discovery (Inventory Management)

With StableNet^®, engineers will have one unified product for many applications. In other words, there is a minimization of the use of the “zoo” of existing monitoring systems and, as a result a reduction in maintenance costs for various management systems (service, training, etc.).

StableNet^®accelerates and simplifies the process of network management.

1. Failure control and event correlation

The functionality is implemented through the Root Cause Analysis mechanism (without the need for writing rules by the administrator) and the Alarm States Machine mechanism built-in Syslog handlers, SNMP Trap, Dynamic Rules Generation mechanisms, as well as a set of adapters for existing equipment management systems.

The Root Cause Analysis mechanism is an analysis of the root cause of an accident based on the following procedures:

Automatic detection and localization of the place of failure;
Reducing the number of emergency events to one key;
Identification of the consequences of a failure – who and what was affected by the failure.

The Alarm States Machine mechanism is monitoring the status of monitoring objects, which allows you to reduce the number of emergency events through the following procedures:

Settings for the waiting threshold, or the time interval during which the system accumulates information about an accident before issuing an alarm message. This functionality allows you to avoid short-term accidental exceedance of threshold values;
Tracking fluctuations in measured indicators (flapping). For example – if the interface changes its state 8 times within a given period of time, then an alarm message is generated, otherwise not;
De-duplication of recurring events.

Universal adapters – a set of standard custom adapters that provide connection to heterogeneous data sources, among them:

Syslog adapter
SNMP Trap adapter
Log file adapter
SSH/telnet adapter
CSV adapter
Corba and others.

Dynamic Rules Generation Mechanism – This allows you to create rules for processing events from adapters on the fly and replenish the existing dependency model for root-cause analysis.

2. Performance Management

Monitoring the quality of communication channels;
SLA monitoring – using own microprobes or using Cisco IP SLA, Huawei HQA;
Control of any numerical indicators from infrastructure elements and systems;
Plotting charts based on KPI and KQI indicators;
Proactive monitoring – building a trend line by the metric and determining the time after which the threshold will be violated;
NetFlow traffic analysis, CDR/XDR analysis;
Monitoring of user transactions by placing special robots that imitate user actions (they include a script builder that allows administrators to pre-record the script for robots to control a particular application).

3. Configuration Management

Centralized collection, comparison, and restoration of network device configurations;
Making changes to the equipment configuration, including according to the schedule;
Checking configurations for compliance with policies (corporate, security, equipment manufacturers).

4. Equipment inventory

Topology visualizations: L2, BGP, MPLS/VPLS, PW, VPN/VRF, Spanning tree, OSPF, etc. – the system builds maps by polling equipment and identifying neighbors;
Automatic detection of the network, server, and other IT infrastructure equipment and their network connections;
Identification of obsolete equipment (EOS, EOL), as well as vulnerable software versions;
Tracking and accounting of IP, MAC addresses, and network connections (MAC and IP tracking);
Generation of reports on equipment with the ability to upload to external systems (Excel, PDF).

StableNet^® Solution

StableNet^®is a powerful solution to change the way you manage networks and services. The innovative benefits of StableNet^®lie in its extremely extensible solution to Root Cause Analysis via Dynamic Rule Generation (DRG). The Dynamic rule Generation runs during discovery and auto starts searching and updating the dependencies of different network elements. Instead of calculating hundreds of events and trying to get Root Cause Analysis via static rules written for each network element, your network is strongly tracked and supervised. Rules are created automatically using visual templates as events occur.

However, the main novelty of StableNet^®is that it provides the feature to integrate info and data from several sources into a single database. In return, the database serves as a source of information that can be integrated using tags to organize and understand network components.

StableNet^® helps you to manage your entire multi-vendor and multi-technology network from a single graphical interface. It is based on a single codebase, single GUI, and single data source for a unified solution. It is a 4-in-1 solution that provides that facility for configuration, fault, inventory, and performance management. With DRG, the StableNet^®solution is significantly less resource intensive and provides a richer insight into network events. The result is an extensible, streamlined, and automated RCA that can be configured to provide information about the most critical business processes and related services. Then add advanced features and information to your Netcool Fault Management System or completely replace it and enjoy all the benefits of StableNet^®.

StableNet^® key benefits

Manage your entire network: StableNet^® manages and monitors the status of all the network devices, such as routers, servers, and switches. You will receive instant notifications when any devices don’t work as configured.

Alarm and events: Smart correlation to quickly identify, classify, investigate and troubleshoot network problems.

Find out faster about network problems: It determines the root cause of network issues with the automatic system. It quickly troubleshoots the network issues instead of wasting a lot of time finding the source of the problem.

Centralized and scalable monitoring: StableNet^® helps you to control all key services, devices, or business processes of your organization.

Analyze business impacts: Keep network users up to date with the latest developments. Identify and measure the impact of the incident on critical business operations and bring proper action early.

Increase the availability of applications and infrastructure: StableNet^®quickly troubleshoots and fixes errors to reduce crucial network downtime.

Automated Root Cause Analysis: Identify and resolve issues such as crashes and performance declines faster, before the impact of the issue affects several users, by identifying the root cause of device or service failures.

Sign up for a live demo of our network fault management solution

Feel free to sign up for a live demo of StableNet^®. If you are looking for an efficient and secure monitoring & fault management platform that can also manage your multi-vendor and -technology network infrastructure through a single GUI, StableNet^® may be the right choice for you.

Request a live demo now