![]() |
|
|
|
|
|
|
|
Business Recovery Over Wide Area Networks: Are You Ready?
by Randolph A. Fisher - CBCP
ABSTRACTThe ability of a corporation to communicate internally and externally is essential to its economic viability. Data has become an even more valuable corporate asset in recent years and the need to ensure business continuity and recoverability from minor or major disasters is no longer a luxury but an absolute necessity. In fact, many industry regulatory bodies mandate proof of recoverability. Court rulings in California have allowed attachment of personal assets of lower level corporate managers for lack of due diligence in recovery planning.
This paper will be briefly address the following:
INTRODUCTION Data networking needs have been expanding tremendously over recent years due to new applications and higher bandwidth requirements. Businesses require the ability to move data efficiently and reliability with near zero downtime for mission critical functions. Data networks no longer use a single technology but are comprised of a hybrid of technologies that are being driven to their limits with more users and ever increasing traffic.
BUSINESS IMPACTSA disruption of key primary or support operations can result in a crisis situation for any business. It need not be a catastrophic failure like the Oklahoma City or New York World Trade Center bombings. It could be an outage to a single data circuit that carries payroll information or it may be a high speed circuit that is needed to design a car or that helps in a medical emergency. It is usually the small things that cause the greatest problem; the $3 dollar dust cap in a $30 million rocket.
Studies have shown that 80% of companies without well conceived and tested plans go out of business within two years after a major disaster.1
Reasons for Having a Plan Many would argue that they have insurance so why worry about having a business continuity and recovery plan. The chief reason for having a recovery plan is the safety and security of your employees. Next, there must be clear management succession and emergency powers delineated to allow timely decisions to help minimize losses, assess situations, and act toward resuming critical functions.
Government mandates have been issued as early as 1989 regarding contingency planning for U.S. financial institutions and government agencies. The December 1997 DiMartini and McNally article provides a comprehensive overview of regulations that are in force and certainly worth familiarity by your internal auditors. 2
Recent court rulings in California have allowed stockholders to sue for damages and attach personal assets of executives and lower level managers for not having done ‘due diligence’ in business continuity and recovery planning. Many industries have imposed guidelines and regulations mandating that a plan be developed and tested; some of these are banking, brokerage, insurance and medical to mention a few. Finally, many business continuity insurance carriers will not provide premium discounts or even renew policies without demonstration of a plan. “Business continuity is a culture, a mindset, a philosophy, and a way of life. These principles must guide its inclusion into your corporate goals.” Incidence of TestingWhy aren’t businesses getting the message? According to data loss statistics from Info Security News Magazine 3:
The number of businesses that claim to have contingency plans has grown over the past few years to 68% primarily from painful lessons learned from recent man made and natural disasters. However, only 45% of these actually test their plans.4 “If you don’t test your plan … you don’t have a plan”. Having a large binder on the shelf does not mean you have a plan. Having the only copy of the plan in a burning building doesn’t help anyone. TradeoffsIs development and testing of a business recovery plan costly? Undoubtedly, keeping operating costs low is every employees’ concern. There is a balance between the cost of reliability versus the cost of downtime. Whether you conduct a detailed Business Impact Analysis (BIA) or do a back of the envelope estimate, you must determine the impact of downtime of your network on your business. That will provide some guidance on what you should be willing to spend.
You will need to consider these points:
Economic ImpactsIn today’s global economy, being out of touch with your customers could drive you out of business. The average cost of downtime within the U.S. is approximately $1,400 per minute or $84k per hour. 5
Millions of dollars per hour can be lost and some businesses never recover from a severe outage. Airline reservation centers or car rental companies would lose thousands of customers by not being able to receive calls; potential customers could easily find alternate providers. Brokerage operations could lose millions of dollars if their wide area communications network fails even for a short period of time.
Lost revenue is not the only concern of business disruption; there are many other effects as well. Inability to meet your customers’ need could ultimately lead to a reduced opinion of your corporation’s image. Investor confidence and market share can be affected if you cannot be reached. A failure at your primary data center, local or wide area network isolates your internal and external customers from gaining access to critical resources.
GENERAL PLANNING CONSIDERATIONS When analyzing the areas of potential risk to your telecommunications network, be sure to include the following: power sources, computer hardware and software, cabling, access circuits, and wide area circuits.
Electric & Gas Do you have overhead or underground facility feeds into your building? Do you have the electric utility and fire department numbers easily assessable? Contingency Planning Research reports that 45% of all computer data loss in the U.S. is due to power outages or surges. 7 In fact, telecommunications and computer systems are subject to an average of 120 power problems per month. This translates to an average of 4 power events per day. 8 Do have an uninterruptible power source? Did you run it under full load? Do you store adequate fuel for long duration outages? Do you employ surge suppression devices on key equipment? Did you know that most electric utilities do not guarantee transient free service? Hardware and SoftwareA recent study shows that 41% of data center disasters are from hardware failure or power loss. 9 Do you have redundant hardware and software? How often do you back up your system? Are your suppliers local? Are you Year 2000 compliant? Cable FeedsMost building structures have only one entry point for electric, gas, and communications. Having multiple feeds provides greater redundancy. This is certainly worth including in any new construction or renovations to older buildings.
GENERAL NETWORK PLANNING CONSIDERATIONS Typical network reliability of 99% this still translates into almost 90 hours of downtime per year costing the average U.S. business approximately $7.3 million in lost revenue. 5
Before choosing your network carrier you should consider the following: capacity, availability, quality, technology employed, maintenance & ownership, restoration capabilities and testing philosophy.
Besides providing a quality connection, a carrier’s network must provide adequate capacity and route diversity to enable you to use a spare facility or redirect your traffic from a failed site to an active site. Next is the carrier's record for availability of circuits for different speeds and services? What technologies do they employ in their restoration schemes? The type of network architecture they employ is important. A spur arrangement has limited recoverability. Ring architecture is better since it provides self-healing and other automatic restoration capabilities. You should also inquire about the number of carrier offices and proximity to your buildings that can provide the specific services that you subscribe to. Are multiple offices nearby? Also, what alternate routes and facilities exist for rerouting traffic during a network disaster or cable outage?
Does your carrier own all their own facilities or do they purchase and resell from another vendor? Is the monitoring of the network seven days a week, twenty-four hours a day? Do they have disaster recovery certified technicians?
How often does your vendor conduct disaster recovery exercises on their network to maintain competency and efficiency in recovery situations? Be sure to ask these questions and more.
Local Access Considerations & Alternatives The design of the local access portion of your network is critical. It is this last mile that can cause the most problems. You need to make sure that your primary data site is not isolated by a network failure. All of the options that are available to protect this section of your Wide Area Network should be reviewed. You must identify how the circuits come into the building and the route of the access cable to your location from the carrier’s central office. Cable diversity should be built into your access network to avoid a single cable cut taking down all of your service. Many access providers define access diversity as cables separated by twenty-five feet or more. Ask how your provider defines diversity. Hot SitesOne alternative for a host failure or local access outage could be moving to an alternate site. If a major disaster happens, moving the entire data center to a “hotsite” or another company location could be necessary. A hotsite is a completely equipped data center available to a customer in a disaster situation. It provides security, equipment, and living facilities while the main location is unavailable. This moving to a “hotsite” requires detailed planning and subscription beforehand with a “hotsite” vendor and the network provider.
With proper planning, you can protect your network by splitting traffic over multiple circuits. In the event that one circuit fails, traffic can be redirected by using an alternate circuit. Local channels can also be protected by routing them through different local exchange carrier offices. With the advent of new technologies, fiber rings provide diverse routing and self–healing benefits, along with new services and greater bandwidth capabilities. Microwave radio networks and cellular phones are also alternatives that bypass the local providers network.
Dial-Around TechnologiesThe 1998 Suitor article indicates that up to 160 hours of outage per year can be experienced by not using a recovery technology like dial backup. 10Many companies have found dial around technologies to be an effective recovery vehicle. In simple terms, it enables a user to dial outside their network in an automated way when key equipment senses a loss of signal. Services being employed are Switched 56 kbps, Integrated Switched Digital Network Basic Rate Interface (ISDN BRI), or the good old reliable public switched network. Many financial, manufacturing, and health care organizations have employed this as one of their backup tools and have found the 15% - 18% incremental cost of these redundancy features to be very economical.Wide Area Network ConsiderationsThere are also a number of different options to review when selecting an interexchange carrier. Inappropriate assumptions can be made about providing reliability for the wide area portion of the network. Many people believe that a single vendor could not provide all their diversity requirements. They believe that this can only be accomplished by splitting vendors. Nothing could be further from the truth. “Route diversity is far more important than vendor diversity”. Many fail to recognize the complications with vendor splitting. The most common are the railroad right-of-ways. It is not uncommon to find most of the major interexchange carriers all in close proximity as they cross bridges. When you do vendor splitting, the burden of service restoral across multiple networks rests with you the end user. Diversity definitions and capabilities vary widely by interexchange carrier (IXC). One carrier defines diversity as circuits being at least one hundred feet apart and at cable crossovers there must be a vertical separation of two feet and enclosed in heavy gauge steel pipe. The key point here as with access diversity is to determine how your interexchange carrier defines diversity. Besides cable proximity, you must also acknowledge that cable routing to a common central office building must also be considered. Therefore, when designing your network, closely scrutinize the carrier's definition of diversity and restoral options.
Network protection services provide options that would increase the reliability of your network at additional costs. Protecting the interexchange portion of your network is achieved by a number of different options. Network switches can automatically change to a backup circuit if required. Enhanced routing options can provide diversity and special routing. Finally, network management tools can provide flexibility for both the anticipated and unanticipated events. Wide Area Network Management ToolsNetwork management tools are not just for large customers. Small businesses (< $5 million of annual revenue) can also benefit from these tools. Less than 2 percent downtime per month for 50 percent of a company’s employees can result in an annual cost of nearly $200k. In this case, the business found that internetworking products that reduce downtime would pay for themselves in six to nine months. 5
When planning and implementing your disaster recovery plan, you may want to consider network monitoring and management tools. These tools should provide the status of your circuits and should include information on: availability, performance, chronology of problems, and alarm activations. Some interexchange carriers have introduced electronic ticket management and electronic customer care features for quicker reporting of service problems, This allows users the ability to enter new tickets, monitor status of open tickets and even do proactive monitoring of overall maintenance schedules. Reports are also available that provide information on trouble tickets, alarms and circuit performance. Network Management Tools on Dedicated Private Line ServicesNetwork management tools used with dedicated private line services have been available for over eight years and offer more network control, dynamic bandwidth allocation, bandwidth on demand, and flexible restoration capabilities. This allows you to optimize your network hence providing maximum uptime, use, and value. Additionally, these tools are used for peaking applications, disaster recovery test activations, video teleconferencing, and time of day applications that can be accommodated in a matter of minutes. Depending upon your carrier, these tools provide visual and audible alarms regarding outages and visually sectionalizes the fault. In addition, some carriers enable you to automatically reconfigure your network in the event of a failure. “The key point here is to know what you are subscribing to. There is a wide range of network management capabilities offered by the interexchange carriers.”
Frame Relay Recovery Options Customers with frame relay networks are also concerned about failures. Options are available to protect frame relay networks from access circuit, site, or geographic failures. Access protection features allow Permanent Virtual Circuits (PVCs) connected to a failed access circuit to be moved to an alternate access circuit within minutes of a customer declared failure. Backup Permanent Virtual Circuits (BPVCs) allow traffic to be routed from a primary location to a secondary location. Growable Permanent Virtual Circuits (GPVCs) not only redirect traffic to a secondary location but increase the bandwidth requirements to meet your application requirements as well. Use of hybrid arrangements of private line and frame relay features allow users to reroute frame relay traffic to other locations such as a hotsite.
These options cover a number of different scenarios. Your specific needs and requirements will determine which options are best for your network. Experience has indicated that the most frequently occurring problem with these frame relay recovery options is not the service itself but rather the lack of keeping customer recovery scenarios current. Many users forget to review the appropriateness of their scenarios or to make corrections beforehand with their network provider. Quarterly review and revisions are strongly suggested. Virtual Offices and Internet ConsiderationsTelecommuting and the use of the Internet have exploded all around us. This presents new disaster recovery situations and vulnerabilities. If your Internet connection fails or your site is destroyed, what types of services are available? Who can provide those services? Numerous companies are providing backup restoral services for customers. In the event of a disaster, users can retrieve their backup data over the Internet and restore it at a hotsite or another company location. If access to the Internet is unavailable, backup could occur using a private dial-up network. Hotsite vendors and Internet Service Providers are now offering these services.
Security on the Internet is a major concern. Hackers and viruses can bring a business to its knees. To prevent unwanted users, firewalls and encryption must be used. These options need to be designed and implemented properly to insure protection. Passwords must be guarded and changed. Users with dial-up arrangements to your databases must protect their equipment and passwords. Software to detect and remove viruses is available. Proprietary documents or electronic commerce must be secured. There are many new products and services being introduced almost daily to secure transactions and provide security on the network.
TESTING For a network disaster recovery plan to be successful, it must be thoroughly reviewed on a regular basis to insure appropriateness of the technologies employed with your applications to be satisfied. The plan should be viewed as a dynamic document and needs to be kept current with the ever changing computing and network requirements. Products and services furnished by your many suppliers must also be included in your analysis.
Your plan will be most effective if it is tested regularly and updated with learning points gathered from each previous test or disaster activation. The plan is only as strong as the weakest facet or component. Testing procedures must include not only the options for redundancy and protection, but must be coordinated with all network and equipment suppliers. Will you get a replacement component in a timely fashion if you have a hardware or software component fail?
Training of personnel is often overlooked or naively assumed. Nothing is automatic; all personnel should be fully informed of their roles and responsibilities. “Tests that surface no errors are not aggressive enough; there are always surprises large or small in every test”. The reason for running simulations and tests are to identify areas of improvement.
Your plan should be viewed as a dynamic document. It should be constantly revised to include the latest requirements and lessons learned from tests previously conducted. I strongly recommend quarterly tests. These need not be a full activation of every plan component. In fact, you can test specific aspects of your plan by creating different scenarios that simulate most probable short term and long term incidents.
SUMMARY The reason for designing a reliable and secure network is to insure your business is available to your internal and external customers when and where it is needed. The cost of reliability verses the cost of downtime must be weighed against one another. This will help you decide what type and which features and options you require. The disaster plan is not complete until your entire network, each key component is reviewed, tested, and the best tools put into service for the fastest restoral. REFERENCES & SUGGESTED READING1. J. Hickman and W. Crandall, “ Before Disaster Hits: a Multifaceted Approach to Crisis Management Business Horizons Vol. 40: Number 2, pg. 75, March-April 1997.
2. W. DiMartini and P. McNally, “ Regulating Disaster Recovery”, Internal Auditor Volume LIV, pp. 42-52, December 1997.
3. K. Sibley, “ Data Recovery: How Safe is Your Business?”, Computing Canada, Volume 23, Number 21, pg. 16, October 14, 1997.
4. M. Cerrulo, Disaster Recovery Journal, Spring 1997.
5. J. Morency, “ Nonstop Networking: Intranet Necessities”, pp. 33- 34, Data Communications, December 1997.
6. Contingency Planning Research Inc., Computer World, August 4, 1997.
7. B. Wyckoff, “ Network Managers Must Look Beyond the UPS”, Electronic Engineering Times, Number 993, pg. 93, February 16, 1998.
8. T. Simonson and L. Carr,” Focus On Power for the Call Center”, Telemarketing & Call Center Solutions, Volume 16, Number 3, pp. 22-26, September 1997.
9. D. Pendery, “ Computer Catastrophes Can Take Varied Forms”, InfoWorld, Volume 20, Number 20, pg. 109, May 18, 1998.
10. K. Suitor, “ Mother Nature Can Play Havoc with Networks, So be Prepared”, Computing Canada, Volume 24, Number 22, pg. 26, June 8, 1998.
|
|||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
|
Copyright © 2003 WAN Communications Corp.