How to solve a problem like Blackberry’s

Opinion 2011-11-01 10:20

Blackberry’s recent network outage, owing to a switch failure at its main datacentre, had a huge impact on confidence in RIM’s ability to provide a resilient, reliable network to its customers

 

With tens of millions of users impacted by the outage (BlackBerry service crash affects BBM messaging for millions) and a cost impact in the region of $100m, the company has suffered severe losses both in revenue and reputation. As any organisation will know, outages on such scale can bring a company to its knees – so what measures can be put in place to ensure this doesn’t happen?

Evolution of the network – managing expansion

Over the past 20 years, companies of all sizes have implemented an IT network to help run and manage their business. During that time business applications, IT equipment and even the networks themselves have multiplied and proliferated resulting in a highly complex IT environment. It is this level of complexity that is a major hurdle for those in charge of network management.

What’s more, the global trend for industry consolidation and mergers and acquisitions has thrown companies together, presenting ICT managers and directors with the massive problem of combining two or more totally disparate networks. With networks of such scale, it is of upmost importance for IT departments to have measures in place to manage the network, regardless of size or complexity, to ensure that all possible issues are managed correctly, that resiliency is the number one priority and that in the event of a failure outage is not an option.

The crucial role of the network

Enterprises rely on networks for accessing business critical information on a 24/7 basis. Productivity is adversely affected when a major outage or loss of access occurs. Therefore, a high performance, high availability network infrastructure is no longer a luxury and maintaining resource accessibility is of paramount importance.  Unfortunately, in many businesses the existence of multiple and independent networks is increasing management complexity and costs.  Such complexity can completely obstruct access to vital documents and services.

As a potential solution to this, many businesses install redundant components as a way to increase network reliability.  In theory, this means that if one network were to fail, a back-up network would launch to eliminate downtime and maintain access to all services. Unfortunately, this approach has many drawbacks that need to be carefully considered.

Network resiliency

With a back-up network, the redundant parts of equipment often remain inactive and untested for prolonged periods of time. Redundant network components such as switches are usually in hot standby status for most of their lifecycle, draining electricity and producing heat without any contribution to the network. 

An additional problem here is that a resource kept in standby mode may well hide a malfunction that will only appear at a critical moment, negatively affecting network operations and resulting in complete network failure.  It’s likely that this is the cause of the recent Blackberry service outage.

The solution? Active – Active infrastructure

With an active-active network infrastructure, all physical links and devices are continually operational and proactively contribute to network performance – no components are left in standby mode and any malfunctions or faults are immediately identified and highlighted. Workload is distributed evenly across both infrastructures and in the event of a failure or outage, all activity will automatically transfer to the remaining active network. 

Active-active network infrastructures not only contribute to the resiliency of your networking infrastructure but can also bring cost savings through reduced energy consumption.  An active-active network, featuring the same level of performance as the redundant network structure, will split the traffic between the two branches requiring approximately two-thirds the power – around 70 percent at a conservative estimate.  Add to that the savings from the reduction in air conditioning (around 30 percent) and despite the initial increase in power usage, the overall saving through reduced cooling power will actually be around 50 percent.Conclusion – re-evaluate your network infrastructure for optimum performance

In an ever-expanding IT environment, businesses need to ensure that they have full control of their network infrastructure.  Network managers need to have an overarching view of what is happening within their IT infrastructure at all times and be safe in the knowledge that if an error occurs, backup procedures are in place to ensure that no downtime occurs. 

It departments need to evaluate their current network infrastructure, including redundant and back-up systems and look to integrate them into the overall architecture.  Mass network outage is simply not an option.

 

Melvyn Wray is SVP of product marketing EMEA at Allied Telesis

Related Articles