Why scalability matters
In the large landscape of network monitoring and management products, it can be challenging to compare products accurately in order to get a sense of which solution might be the right fit for you. In an effort to bring a bit of order to the process, most organizations begin by trying to group products by certain features so that there can be more of an “apples to apples” comparison.
The next step in this process then is to determine the best way to group solutions. Initially, it may seem obvious to use the feature set as the main criteria, after all it makes logical sense to ask questions about capabilities “Can your product do XYZ?”, and then group based on the results. However, a less obvious but probably equally as important question should be asked with regard to scalability. Why? Well, let us examine the current state of most networks we encounter.
In our experience today’s networks are less differentiated by technology than ever before. Virtually everyone we speak to, from smaller single-location companies to large multinational corporations, are using many of the same basic network building blocks like:
- Virtualized Servers
- Public / private clouds
- Storage Networks
- Core / Distribution / Access network topologies
- WLAN availability everywhere
- Multimedia Traffic which requires priority
- Multiple vendors
Due to the ubiquity of these network elements, most network managers and others involved in the monitoring of and response to network problems have almost identical feature requirements. Therefore, we think a better question to ask is “How does your product/platform scale to my needs?” This is because the main differentiator in networks today is not their features, but rather their size.
As an example, let us examine two customers. Customer A is a regional bank; Customer B is a large multinational financial organization. Both A and B probably have similar vendor technologies, switching/routing cores, remote office/branch locations connected via MPLS or similar. They both retain customer data and have strict regulatory requirements related to security. They may both even use some of the same back office software tools. They both need to provide a customer portal for access to online banking tools as well as iPhone/Android App. They need to staff a call center to answer customer questions. The similarities do not end there but I think you probably get the point.
Now, let us examine how they are different in terms of scales. For this example, I am using publicly available data for two real banks that I will not reveal.
|Customer A||Customer B|
in one country
in over 120 countries
Now, it is pretty easy to see that although both of these companies operate in the same industry and very likely utilize similar technologies, the difference in their scale is massive. You can bet that if both of these companies were looking for an enterprise management solution, Customer A could probably choose from almost any vendor that met their needs and their price point, whereas Customer B would have to cut down their list to only those solutions that could be proven to operate on such a large scale.
Let us take a look at how most network solutions are designed and deployed to see what makes one product more scalable than another.
One of the biggest factors which differentiate scalability in solutions is their database support. In an effort to make sure their products are priced as low as possible, many network management vendors make use of open source or free DB’s like Microsoft SQL Server Express, (MySQL/Maria DB), and PostgreSQL. While many of the databases are really powerful and can support advanced scalability techniques like partitioning, at the very high-end they are no match in speed and scalability when compared with DB’s like Oracle RAC. Therefore, it is crucially important to investigate the supported database architecture when comparing product scalability.
A second critical factor is a product’s ability to make use of additional compute resources. Meaning, can the product scale (and scale efficiently) by adding more “horsepower” to it in the form of additional CPU cores, memory, and faster SSD storage. For example, many applications were not designed to make use of additional CPU cores through multithreading, or an administrator cannot control how many threads are used for a particular process in the solution. Only systems which were initially architected to scale in this way will be able to support the larger and larger network loads required today.
Another scaling concept is the ability to scale horizontally by adding more functional components to the overall architecture. In the network management space, these components may go by many names (Agent, Poller, Collector, Probe, etc.) These are typically the components that both actively speak to devices (e.g. SNMP, ICMP, Scripts, API calls) as well as passively collect data sent from the network (e.g. Syslogs, Traps, NetFlow). No matter what this component is called, they all have something in common – eventually they will have to “tap out” at a certain number of managed devices/elements. Three key scalability factors for the overall solution are:
- How many devices/elements can a single Agent support?
- How many total Agents can the solution support?
- Can I designate Agents to only do one thing? For example, Agent A only collects NetFlow and Agent B only collects Syslog/Traps…
By fully vetting the scalability of these underlying functional components, customers can better understand the scalability of the overall solution.
Lastly, a major component of scalability is how well the solution has been architected for High Availability (HA). In very large networks, it is essential that all critical infrastructures have inherent and robust HA capability. This should be (and usually is) a “showstopper” requirement for any solution. For network management solutions, HA needs to be built into all of the core components of the system. This must include:
- The Database
- Any application middleware
- The active/passive Agent pieces
Other considerations related to HA include the ability for individual components to continue to operate and store/cache data even if connectivity to other components is lost, and the ability to self-heal once primary component functionality is restored.
When all of the above factors are taken into consideration, it becomes much easier to separate which solutions can be relied upon to scale to your organization’s needs. Only after you are confident in a vendor’s ability to meet this most critical need should a customer move on to feature and price discussions.
Share this Blog post:
Director of Sales, Infosim