Regular posts on all things StableNet® from a sales, techie, or marketing perspective
Regular posts on all things StableNet® from a sales, techie, or marketing perspective
Features, trends and new product development
#1: Key Challenges for today’s Telco Network & Service Management:
Closing service gaps between Telco and Enterprise management requirements
June 7th 2021, Würzburg
In the past couple years, StableNet® has found increasing resonance as a holistic solution for large Telco network operators. Having secured long-term partnerships with MTN and 1&1 Versatel, along with various regional Telco network operators, we have had the opportunity to gather critical information about what is important today.
This is the first part of a three-part blog series that will be looking into the specific challenges and opportunities that Telcos are currently facing with managing their ever-evolving network infrastructures. The story begins with where we are at and how we got here. We can then continue to look at the impact on service operations in the second post. Finally, we will conclude with a look into the future as AI, automation and machine learning impact the way complex, multi-vendor and –technology networks and services are managed and optimized.
As we deploy more and ever-larger Telco and Enterprise automated, unified network & service management solutions (UNSM), we are confronted with new requirements and challenges within the evolving systems and network landscapes.
In this blog post, I’d like to break down the topic as follows:
- Common key trends we see within our Telco, MSP and Enterprise customers
- How these impact management solutions
- Key requirements and challenges that we see and how we cope with them
- The impact of digital transformation through automation
1. Common key trends that we see within our Telco, MSP and Enterprise customers
The trends that we see within our customer base primarily revolve around the challenges and opportunities associated with a constantly evolving network composition. The biggest change involves a shift from specific, single-function network elements to those that support multiple services.
- Multi-Service routers covering multiple network layers and services within a single, expandable system.
From the operator (outside) perspective, the separation of network layers (e.g. DWDM, GPON, L2, L3, IP/MPLS, L2-/L3-VPNs, Radio/Microwave links, Evolved Packet Core/ EPC) to interconnect 2G/3G/4G-LTE/5G multi-service sites by specific systems is moving towards “plug-in and-configure-what-you-need” platforms.
- Digitalizing Telco IT is enabling new ways to operate their infrastructure and to offer new services bouquets to their customers (e.g. Zero Touch solutions, SaaS offers, SD-WAN, etc.)
- Interest in the automation of standardized, repetitive processes for network element configuration, automated service restoration (“self-healing” jobs) processes, inventory tracking automation, zero touch / self-servicing offers for customers e.g. user plugged-in service access devices, etc.
- Virtualization of networks, IT-Infrastructures, IT-Systems, decoupling of IT applications and the systems platforms they run on SDN/NFV deployments as formerly vendor-dependent networking hardware deployments are moving towards SDN/NFV (Software Defined Networks and Network Functions Virtualization). We see a trend that vendors are coming with their proprietary vendor specific SDN/NFV implementations and controllers.
- New ICT and SaaS offerings by Telcos and MSP for Enterprise are blurring former services boundaries between service providers and enterprise customers by moving the reach of managed services into enterprise customer IT areas and customer management info.
- New covid-19 triggered service offers like “online doctors” eHealth care services are requiring a consistent E2E network services quality and Video call performance between the involved parties and large scale parallel communication.
- Telemetry being put in place for “bookable” signaling of services status information
- High-security requirements across a wider array of services solutions for Telco, Enterprise, Clouds, Applications and IIoT.
2. How these impact management solutions
All these requirements by Telco and Enterprise introduce new challenges that need to be addressed and solved by network and services management solutions. This includes:
- Multi-Service routers now require management solutions supporting
- multi-layer, multi-technology, cross-vendor services quality monitoring, capacity and trend monitoring
- Classical Root Cause Analysis for rapid identification of the service degrading network element and the impacted areas, and
- systems reliant on inter-dependent services (e.g. E2E Enterprise connecting services dependent on regional network elements, which in turn requires advanced Artificial Intelligence / Machine Learning (AI/ML) methods to quickly pinpoint the source of the issue and the impacted customers). All of this needs to be accomplished while keeping MTTR low and service quality levels up.
- Digitalizing Telco IT require joint management and monitoring of network elements, system platforms, virtual and physical IT-Systems, Firewalls, VPN-Servers, as well as suitable ways to monitor and report on SaaS, Multi-Cloud, and Industrial IoT solutions.
- Digital Transformation / Automation in respect to network and services management:
(replacing manual management processes with automated management processes)
- Automation – required to minimize required resources for repeated operation tasks
- Supporting complex task setups with an easy-to-use, system-aided interface for the user (not trivial to design and implement for the management system vendor, but essential to keep efforts and costs as low as possible).
- Usually applied for backup/restore tasks, automated configuration job tasks, mass configurations, Zero Touch device setups etc.
- Virtualization of networks, IT-Infrastructures, and IT-Systems requires the monitoring of physical and virtual platforms, as well as the performance of the running-on-top applications, and the interconnecting infrastructure.
- SDN/NFV deployments need to support multiple controller types and versions in parallel, as well as vendor-specific SDN/NFV implementations.
- New ICT and SaaS solutions require distributed management of deployment, systems provision (e.g. local SaaS management agents), as well as the operation and system updates across Telco/MSP and Enterprise boundaries.
- Services monitoring across ever-growing, multi-technology environments and diverging standards:
- New management requirements and standards that entail complementing or replacing SNMP with REST APIs
- Monitoring of Cloud Services via Cloud specific REST APIs (e.g. Amazon AWS, Microsoft Azure, Microsoft Office365, Google Cloud, etc.)
- EMS-specific element access management data
- Telemetry (multiple data protocols e.g. Apache Kafka, IETF RFC 8321 Inband Network Telemetry (INT), IOT MQTT-based telemetry, etc.
- less commonly used and phased out management protocols like TL1, CORBA, and MTOSI
- E2E services require:
- secure cross technology / cross-layers / cross-vendor, Element System (EMS) data integration, multi-tenancy, cross-country, multi-cloud, and multi-client/enterprise monitoring for complex infrastructures.
- Fault/Alarming/Root Cause Analysis, Performance, Capacity/Trend monitoring.
- E2E monitoring methods migrating from e.g. L2 OAM Y.1731 methods to L3 “IP-SLA” type methods like TWAMP.
- Security – High security requirements e.g.
- Ensuring MSP security within complex, distributed environments
- Managing administration and user access (authentication, authorization, and accounting) across a distributed Telco/Enterprise services environment
- Separating and managing highly distributed high-payload traffic versus distributed Denial of Services (DDOS) attacks
3. Key requirements and challenges that we see and how we cope with them
For our Telco and to some extent also our Enterprise customers, monitoring management requirements are changing. From dedicated fixed and mobile services access to a more seamless services monitoring, solutions must accommodate multiple infrastructure technologies (e.g. Fiber, xDSL, 4G/LTE, 5G, SMART Cells).
Mobile services such as the Radio Access Networks (RAN) are configured and monitored by the vendor specific EMS, while the overall services and infrastructure components like Mobile Packet Backbone, core networks and alarms from EMS are monitored by using StableNet® as an umbrella management solution for service assurance (services quality, fault, performance, capacity/trends).
SNMP in the Telco and Enterprise environments is slowly migrating to Telemetry and vendor-specific EMS-management solutions for:
- multiple vendor-specific, pre-standardized protocols for telemetry data (Kafka, YANG, SDN controller dependent telemetry data, IETF RFC 8321, In-band Network Telemetry (INT) used by ONOS SDN-Controller based on JSON/XML data structure definitions)
- In-band telemetry able to piggyback on payload data,
- Telemetry Agents for Openstack, Openconfig / YANG / RFC 8639/8640/8641
- NETCONF Event notifications RFC 5277 / RESTCONF
- Telemetry exchanges (e.g. MQTT), communications (e.g. gRPC), and management (e.g. gNMI )
There is a pronounced migration away from previous “telemetry” protocols like Syslog, Netflow / sFlow, VPCflowlogs to newer telemetry methods. One example we see is the movement away from Telco OAM to network telemetry. We see a strong desire for E2E Measurement integration and monitoring away from L2 focused Y.1731 to L3 IP NE and shifting towards NE and E2E performance measurement focused Two-Way Active Measurement Protocol (TWAMP).
Vendor telemetry implementations (no set standard for network telemetry, hence an attempt at an IETF framework for telemetry):
- Waystream (streaming telemetry on datapoints, NE performance and status, anomalies detection; InfluxDB line-protocol over HTTP, JSON over TCP)
- Amazon Web Services VPC Flows (to monitor hosted sites)
- Cisco ACI data center fabric
- Cisco Telemetry Brooker
We see challenges in moving to telemetry as the local processing load for an array of distributed elements (e.g. network elements, IOT sensors, etc.) to a few central network and services management systems. With NEs potentially enabled to send large volumes of telemetry events to central sites at high rates (large volumes per Second), central management systems can become a bottleneck if the overall solution deployment is not carefully designed.
4. The impact of digital transformation through Automation
Digital transformation is providing an environment to automate many formerly manual and tedious daily Telco and IT operations processes. This empowers operations- and IT-staff to focus on real operations challenges and support more services with the same head counts.
Zero touch approaches allows user and customer self-services deployments. Some examples include NE or IoT device configurations when attached to a network, auto provisioning of SaaS devices, and reporting and billing.
Operational requirements are driving network and services management away from more manual user-type event analysis to AI/ML automated data analysis.AI/ML is requested for the automated correlation of central system fails, along with the associated impact on related systems and customers. Beyond classical RCA, AI can help to pinpoint sources of services degradations, outages and even trends by correlating regional services issues to the underlying cause. This heavily reduces MTTR and downtime costs while ensuring high service quality.
More efficient, automated services are likely to provide better (expected) services quality, which in turn attracts more customers, reduces churn, and can significantly increase revenue streams. A change in customer revenue by +/- 5% can easily justify costs associated with the underlying, long term network and service management solution.
Coping with change is a daily challenge for network operators. New technology and services offers should not be adding to the complexity of the operated infrastructure (hence our focus on keeping the “Zoo of Management Systems” at a minimum). Management automation will help to keep operations efforts and costs down while simultaneously enabling experienced operations staff to focus on specific challenges rather than redundant, standard tasks. Providing consistent services quality (or even a bit above expectations) will secure and increase the revenue streams of Telcos and the overall value of Enterprise IT.
Strategic Alliance Manager at @ Infosim® GmbH & Co. KG
Peter Mößbauer is a Strategic Alliance Manager at Infosim® . With over 30 years of experience in the telecommunications industry, he has honed his specialization in IT-Services Management, Industrial IoT Solutions and Telco business operations.