Risk management and Fault tolerant networks


- DRAFT -
Peter Shipley
This months article is to covers the basics of Risk Management with regards to your Intranet and Internet presence.
Risk management is the systematic process of managing an organization's risk exposures in a effort to achieve a more reliable fault tolerant environment. Simply put, it's a method for comparing potential gain against potential loss. And formulating a plan that is financial optimal.

The importance of Risk management is commonly ignored subject in respect to network connectivity. While it is common for office phone and voice mail systems to be fault tolerant it is unfortunately not common for Intranet networks to be configured in such a fault tolerant manor.

A risk cannot be managed unless it can be evaluated. Only with such a evaluating can possible points of failure be located, identified and controlled.

The evaluation process can range from a simple to a very complex process. The first step it to account for all the proper risk variables and assign 'values' to these variables.

Such risk variables include:

The should also evaluated with the potential costs of down time in case of network failure. These costs can be formated from loss of Internet sales or services, and the hidden costs of lost employee productivity. The economics of a single day of failure can be devastating then you consider that you are still paying the salaries for a fill work force and may take up to a week to recover the lost time and opportunity. Studies have shown that when a employees concentration is interrupted, even for a minute, on the average 20 minute of productivity is lost.

From my experience it is all to common for companies to to patch obvious risks an not become aware of the true problems till it is to late. Such common mistakes are placing desktop systems and building servers to backup power UPSs, while leaving the closet hubs and routers unprotected.

For companies that rely on there Internet connectivity for there cash flow a investment in a spare router configured for how swap may be well worth the investment. Also consider the lost productivity of remote employees and tele-commuters.

DNS service is also a commonly over looked problem, the Internic only requires two name servers, one primary and a single (local) secondary. A proper DNS configuration should have at least three secondary severs located on different continents.

Data Backup is very a common problem. While most sites preform backups on a regular basis, most site do not backup desk top system where many users habitually store data (as opposed to on the centralized file server). This also applies data loss on notebook computers, a common place for recent proposals to be stored.

Of course there is always the case, commonly in smaller companies, where the cost of a second Cisco router is far greater then the cost of a day of downtime, but with out a risk assessment such a decision is "risky".

Another point of failure is the backup system itself, unless it is regularly tested they can be of little use. It is very common for companies to discover they months of backup tapes are corrupt. Even UPSs that have never had to provide power will fail after a few years unless there batteries are maintenanced.

Thus I would like to ask, in closing, for the readers to look around and try to judge for themselves possible risks they are taking. Can you afford to loose the data stored on your laptop? If a central part of your LAN were to fail what is the expected recovery time and what costs would be incurred.


About the author: Peter Shipley (<>) lives in Berkeley, Calif., and has 14 years of experience in network security. He specializes in system security auditing and risk assessment, Unix system security and TCP/IP network design and implementation.