Business continuity planning (BCP) encompasses planning and preparation to ensure that an organization can continue to operate in case of serious disasters, incidents, or cyber-attacks and is able to recover to an operational state within a reasonably short period (preferably under 24 hours).
Key Points to Planning
- Design a resilient network with redundant systems.
- Develop and test your recovery plan on a regular basis.
- Train your staff to be well aware of contingency plans and alternate ways to communicate in the event your primary systems are unusable.
- Design your backup systems to be away from your primary datacenter and if possible not accessible from your primary network.
- Set service level agreements as to when to revert to your backup systems live. For a brief outage, it might not be worth switching over, but when your facility and/or datacenter is severely damaged, disabled, or destroyed it would be.
The Three Main Elements
Business continuity includes three key elements: Resistance, Recovery, and Contingency.
- Resilience: Critical business functions and the supporting infrastructure must be designed in such a way that they are not easily disrupted and allow for excess capacity and redundancy.
- Recovery: Arrangements have to be made to restore or recover critical and less critical systems that fail for some reason.
- Contingency: The organization establishes a readiness plan and the capability to recover with whatever major incidents and disasters occur, including those could not have been, foreseen. Contingency preparations might require a last-resort response.
Since most companies don’t have unlimited resources, it is important to plan the most cost-effective disaster recovery solution possible while still meeting minimum application and data requirements (what applications and files are essential) and a reasonable amount of time to get business back up and running.
For IT purposes, there are three general types of backup solutions:
- Hot where your backup site is up 99.999%* of the time and mirrors your production environment in real-time and can be used with little or no work from your IT department. This is also the most expensive since enterprise-level software licensing and robust hardware comes into play.
- Warm where your backup site is up and parts of it might mirror your production environment in real-time, while other parts might take a few hours or a day or two to bring online. This is what many organizations see as a nice balance between cost and need.
- Cold is when a site set up AFTER it is needed. Systems are restored from backup media and brought online as ready. This is the least expensive but also takes the longest to recover. This works best for archived data such as old employee or client files.
* This also referred to as the five 9s.
Outside the IT area, make plans for the paper (aka “hard copy”) information that may need to be preserved in your physical office space.
Testing is Critical
Be sure to test all recovery plans to ensure that the planned solutions will provide the level of restoration that you need in the time you need it. Plans may fail to meet expectations due to insufficient or inaccurate recovery requirements, solution design flaws or solution implementation errors. Testing may include:
- Training your Emergency Response Team: Appoint a special task force consisting of members of management, IT, HR, and Facilities/Office Administration, then run them through a mock crisis.
- Switching some tasks from primary to secondary data centers.
- Application testing
- Business process testing
At a minimum, testing should be conducted on a biannual schedule.
Tabletop exercises typically involve a small number of people and they concentrate on a specific aspect of the plan. They can easily accommodate complete teams from a specific area of business.
Another form involves a single representative from each of several departments or teams. Typically, participants work through a simple scenario and then discuss specific aspects of the plan. For example, a fire is discovered out of working hours.
The exercise consumes only a few hours and is often split into two or three sessions, each concentrating on a different theme.
A medium exercise is conducted within a “Virtual World” and brings together several departments, teams or disciplines. It typically concentrates on multiple BCP scenarios, prompting interdepartmental interaction.
The scope of a medium exercise can range from a few teams from one organization co-located in one building to multiple teams operating across dispersed locations. The environment needs to be as realistic as practicable and team sizes should reflect a realistic situation. Realism may extend to simulated news broadcasts and websites.
A medium exercise typically lasts a few hours, though they can extend over several days. They typically involve a “Scenario Cell” that adds pre-scripted “surprises” throughout the exercise.
A complex exercise aims to have as few boundaries as possible. It incorporates all the aspects of a medium exercise. The exercise remains within a virtual world, but maximum realism is essential. This might include no-notice activation, actual evacuation and actual invocation of a disaster recovery site.
While start and stop times are pre-agreed, the actual duration might be unknown if events are allowed to run their course.