[This panel discussion happened on 23-Aug-23 in annual “Best Practices Meet” organised by Data Security Council of India (DSCI)]
Reliability is the outcome, Resiliency is the way you accomplish that outcome, and (Disaster) Recovery is one of the many practices that you need employ to build resiliency.
A simple analogy to understand three important Rs of Site Reliability Engineering (SRE)
✳️ “Reliability” is the ability of a service (or anything) to function as expected and to do that consistently: You have a 100 watt light bulb. Every time , you switch it on, it turns on and gives the brightness expected from 100 watt, and continues to do so till the switch is ON
✳️ “Recovery” is the ability to restore a service from failure: If a power failure happens, you have a mechanism to switch over to an alternative source of power such as a diesel generator or a UPS, so that bulb keeps giving the brightness
✳️ “Resiliency” is much broader than recovery. It is the ability to withstand or adapt to adverse events, by quick responding or by recovering fully, if there was a failure while remaining functional from customer perspective. In the case of light bulb, to get the resiliency you’ll do things like having the right quality/ strength of filament/ cables, MCB for protecting against short circuits, stabiliser for power/voltage fluctuations, DG or UPS for a power failure
Like above, there were many other insightful points including cyber and digital resiliency that we discussed in a panel connect on the topic “Resiliency’s for Digital Enterprise. How it would be different from BCP/DR?”