The world is getting weird.
Increasingly epic weather events, multiplying global geopolitical uncertainties, and stubborn economic stressors make for an exceptionally unpredictable business environment. To navigate this unfortunate but unavoidable “new normal”, enterprises need a strong but flexible foundation in both business continuity and operational resilience.
Aren’t these basically the same thing? Nope. It’s a common misconception that operational resilience is the more or less automatic outcome of having solid business continuity planning in place. The two are intertwined but separate: Business continuity is planning how to respond, and then recover, in the event of a given disaster (CSP outage, DDOS attack, insert your own worst nightmare scenario here…). Operational resilience means proactively building in capabilities to ensure that, if said disaster does happen, it doesn’t affect you in the first place.
And both are, in their own ways, essential for an organization’s survival and success.
Spotting the differences
Business continuity and operational resilience share the same ultimate goal: survival. However, they reach it from different directions. How can you tell them apart?
Objective: Operational resilience aims for adaptability amidst the unknown; business continuity focuses on preserving known functions during known challenges.
Approach: Operational resilience leans on being proactive, embedding resilience in day-to-day functions to ensure systems and operations can adapt to any form of disruption. Business continuity is reactive, kicking in when specific disaster scenarios unfold with pre-planned response measures.
Scope and breadth: Operational resilience is a holistic strategy for responding to any kind of disruption that presents itself. Business continuity is specific, targeting known threats and pre-planning an appropriate response for each scenario.
Think about it in terms of playing a video game. You’re in the final boss battle (talk about mission critical!) and suddenly the game crashes. Business continuity is the equivalent of being able to go back to the last save point and pick up close to where you left off. Operational resilience is where the game experiences the same glitch but, since the software was architected for zero downtime, your gameplay was never interrupted — you never noticed there was any kind of issue.
Examples of business continuity and operational resilience in tech
Though different, they are also symbiotic: An organization adept in operational resilience is better positioned to create a durable business continuity plan. Conversely, a company with a strong business continuity plan has a foundation that naturally fosters operational resilience. Like peanut butter and jelly, they are better together.
Let’s look at how this plays out in the real world.
Distributed Denial-of-Service (DDoS) Attacks: A sudden massive DDoS attack can paralyze even the most elegantly architected system.
For business continuity, you have a backup system ready to keep operations running — like switching to backup servers or employing cached, static versions of web pages to keep services running until the attack is stopped.
For operational resilience, however, your systems have been architected to recognize threats and act to prevent any damage before it occurs. Your application stack includes services that, in this scenario, automatically handle traffic rerouting, load balancing, and adaptive rate limiting. Your database automatically redistributes to available nodes.
Unplanned Outages: Cloud providers have outages all the time. Occasionally these are significant enough to bring entire regions to a standstill.
For business continuity, your app can be engineered for replication of data across multiple regions, with automatic failover, ensuring data remains consistent and available. This doesn’t help, though, if you depend on cloud provider services as part of your application architecture. If, say, S3 or EKS go down, your ability to operate goes with them.
Operational resilience to survive even major region outages comes from choosing cloud-agnostic services for your application stack to create cloud portability and enable multi-cloud or hybrid deployments. This way, even if one CSP has a complete worldwide outage, your application stays up and running on the parallel platform with no data loss.
One important difference
A major point where operational resilience and business continuity diverge: when was the last time some government entity enquired about your company’s business continuity strategy? This doesn’t happen, because business continuity planning is something that organizations pursue on their own, for their own survival. Outside entities don’t get involved. Things are becoming very different when it comes to operational resilience. Countries around the world are beginning to create legislation and regulatory standards to require operational resilience, beginning with critical sectors like financial services . One of the most significant is the European Union’s proposed Digital Operational Resilience Act (DORA), which seeks to ensure that all financial market participants have effective strategies and capabilities in place to manage operational resilience. DORA is expected to apply to all digital service providers, including cloud service providers, search engines, e-commerce platforms, and online marketplaces, regardless of whether they are based within or outside the EU.
Why operational resilience matters
We’ll say it again: the world is getting weird. It’s impossible to predict the unhappy surprises that climate change or armed conflict between countries or economic fluctuation may bestow at any moment. Add to that the growing possibility that government regulations may pop up to directly affect your business and your bottom line.
When all you can do is expect the unexpected, you have to be ready for…well, everything really. Operational resilience is how you get ready.
This means hardwiring operational resilience into an application architecture by making every piece of your application platform agnostic. A cloud-agnostic application architecture can allow for easier scalability and flexibility. As your application grows, different services or platforms can be added or replaced without the need for major code changes. Being cloud agnostic also ensures the interoperability of applications across different cloud service providers: besides guaranteeing resilience and availability, this also makes it straightforward to satisfy any operational resilience regulations that eventually arise.
After all, who needs business continuity when you have four nines of uptime? Just kidding, sorry, you do still need business continuity planning — but consider the guarantees and the service level agreement that come with best-in-class managed services. When things fail (as they always do; as Cockroach founder Spencer Kimball says, “Sh*t happens, and at scale sh*t is always happening”), managed services have a team dedicated to fixing it immediately. You don’t need to provision one to go respond, and ideally you won’t even notice anything happened. Ultimately, the simplest way to build for your own operational resilience is to choose architecture made up of cloud agnostic, highly available services that have already solved this problem for you — essentially, Operational-Resilience-as-a-Service.
With that in place, business continuity planning just got a whole lot easier.