By Craig McLellan,
Founder | ThinkOn
The terms “Business Continuity” and “Disaster Recovery” are not new. Business operators and IT leaders have been using these terms for years. So why all the renewed interest in a field that for many of the largest organizations was solved long ago?
I believe there are two reasons.
First, the media fixation on disasters has many people believing it is inevitable, and, since their business relies on technology, they run the risk of business failure at the first sign of trouble.
The second contributing factor might also have something to do with the rising number of solutions that exist in the market. For the first time in the history of the technology industry, there seems to be options for every sized company – entrepreneur to enterprise.
As a contributor to the technology industry, it is important to understand that you have a responsibility to educate your customers or users that failure is not something you can avoid. We are dealing with technology, after all.
The best thing we can collectively do to improve the experience our customers or end-users have is to build solutions based on the assumption that more often than not, failure will occur and typically at a very inopportune time. Whatever anyone may think or tell you, technology fails or will at least require proactive maintenance.
There is no shortage of material that exists explaining how to deploy technology to accomplish something. Unfortunately, it’s written from the perspective that the technology is a solution, and it’s the reader’s job to find the problem. People need to know that there are simple, straightforward ways to improve the way information technology is deployed. IT leaders need to reduce the impact of outages on their own lives; because, as much as the hero culture is alive and well inside most IT departments, it’s not healthy to be forced to be a hero.
As soon as everyone understands that outages are inevitable, the design process can focus on dealing with failure instead of wasting time identifying scenarios that may or may not occur. Instead, by replacing specific scenarios with a single scenario of failure, we can focus on ensuring the application be capable of dealing with failure.
An analogy to consider might very well be that of an escalator. I’m very confident that everyone reading this has encountered a situation where an escalator has been cordoned off and is not available for use. People aren’t forced to wait, they are simply redirected to either an elevator or stairs and they continue on their journey. The person that designed the building knew the escalator would require service, so they provisioned options that allowed the user to continue – granted in a different and probably more frustrated way. So why is it that technology leaders fail to make the same considerations? We regularly allow users to encounter simple failures that could be designed for. Imagine a user receiving a message advising them an application is not available for the next four hours instead of a “web site not found” error message.
If you focus on deploying applications and infrastructure, your goal as architects is to reduce the likelihood of failure but understand the end-user will still judge you based on their last interaction with the application you built. Therefore, it’s important that access to every application be handled in such a way that the user knows the state of the application and a suggested course of action. The suggested course of action may be “wait” or “please come back later”; but it will always be clearer than an error message that an end-user cannot interpret. I want to be very clear here. I’m not suggesting you abdicate any responsibility for building the absolute best solution you can for your users; just remember to make sure there are no loose ends that can ruin the perception of a great solution.
Now that failure is a potential state, it’s time to determine what the best approach to recovery is. This approach needs to be flexible enough to deal with a broad group of scenarios. So regardless of the cause of an outage, your users have clear expectations around when they can get back to work.
This is precisely when a simple strategy is the best approach for all but the most complex applications. The good news is you probably have most of the technology you need; the way you use it just needs to change slightly.
Cloud Recovery Use Case
There isn’t a better use case for cloud computing then building a simple, broad recovery strategy for your applications. Existing data protection applications can be augmented to allow them to vault all of your backup data electronically into an on-demand environment where any application can literally be reconstructed in hours or minutes; and it shouldn’t cost more than what you pay to send tape backups off-site daily. Great complimentary enhancements exist today for the data backup technology you already own, which will allow you to improve the recovery posture of your key technology infrastructure with minimal investment. Your operations people don’t need to go through a steep learning curve, and you can cover most of your incremental costs with the savings of no longer paying someone to store tapes off-site.
Imagine being able to recover your entire technology footprint in as little as minutes by making simple changes to the technology you already own. Now you can focus on enhancing the critical applications that allow you to differentiate and compete.
About Craig McLellan
Over the past 22 years, Craig has logged over 10,000 hours – and countless Airmiles – while building high-availability applications for Fortune 500 clients of Hosting.com, SunGard, and other large U.S. cloud providers. Craig lives in Toronto, Canada; and, along with raising a family, coaches baseball and plays hockey.
Note: This post was originally published in September 2015 and has been updated for accuracy and comprehensiveness.