I was in a discussion with some industry peers recently and it went a little like this: what do you do for disaster recovery planning in an isolated environment like New Zealand? Do you utilise the cloud or rely on redundancy through replication? What about remote working or standby locations?
The conclusion of the discussion was that there was a lot to think about and some serious planning is required. This task falls to the enterprise CIO or IT Risk Officer depending on the size of the organisation. Those enterprises that outsource will rely on the IT partner to supply this solution and execution; however many will derive this requirement internally, then outsource the execution of the DR plan.
This topic is not the most sexy or interesting within the IT world. Surely the new iPad or notebook is more fun. However for a business to survive an event or disaster, this topic has to be addressed. If not addressed, in over 60% of cases, an enterprise will not survive an IT disaster.
CIOs often miss the vital step of working out the enterprise’s disaster recovery requirements. This calculation allows the business to set what is an unacceptable outage. Without performing this calculation, CIOs run ahead and assume what the business would accept and build accordingly. This results in a significant overspend or under-delivery of the solution and is not specifically catering for the enterprise’s requirements.
By way of example, an enterprise that has $200,000 of revenues per day on a high transaction count requires significantly more investment to protect those revenues than an enterprise that has revenues of $15,000 per day with a few transactions. Then the business must set what the RTO and RPO are for the business. The Recovery Time Objective (RTO) is quite a wordy way of saying that the business can be down and not trading for X hours/days, but probably not weeks in today’s environment.
There is one other critical calculation that is also required to be extracted from the enterprise. This is the Recovery Point Objective (RPO). The RPO defines how much time/data can be lost and be acceptable. By way of example, if the entire information system and infrastructure is lost for the enterprise, how long is acceptable to have been lost in the information gathering, i.e. how far back is OK to restore to – 15 minutes, three hours or four days? This will allow the CIO or engineer to define the system that is backing up or replicating the IT systems.
There is the ability to replicate servers and the information across the WAN, to the cloud, to perform incremental backups, image servers, perform real-time replication at the physical and virtual layers. The point being that the technology to design and implement an effective disaster recovery solution is with us, however understanding the business requirements will derive the solution required specifically.
I have been involved in my career in designing such disaster recovery or business continuity plans, and the plans derived from the business are always more accurate than assumptions made by the CIOs. I have seen a virus bring the Corporation of London’s 3000-user network to a standstill for over a week and a dentistry practice lose all patient records with minimal recovery, thus losing many patients and the associated revenues. In contrast to this I have also been lucky enough to design and implement systems that have saved extraordinary time and effort with clean and very well executed recoveries. The key to these successes was always testing the plan, to inspect what you expect!