⚙️ Process & Workflows — Disaster Recovery
DR Plan Lifecycle
Disaster Recovery Plan Lifecycle
Click any step to expand · 6 steps
1
📊Business Impact Analysis
2
🏗️DR Strategy Design
3
📝DR Plan Documentation
4
🔧DR Infrastructure Provisioning
5
🧪DR Testing
6
🔄Plan Review & Update
Disaster Declaration & Failover Process
Declaration Criteria
A disaster is declared when:
- Primary data centre is inaccessible for > 2 hours with no ETA for restoration
- Critical service outage (Tier 1/2) exceeds the defined RTO threshold
- Physical disaster (fire, flood, power failure) renders the primary site inoperable
- Ransomware attack encrypts critical systems with no viable recovery from primary backups
Decision Tree
Service outage detected
→ Is it a standard incident? → Yes → Incident Management process
→ Does it affect Tier 1/2 services? → No → Continue monitoring
→ Estimated recovery time > RTO? → No → Incident Management process
→ Yes → ITSCM Manager notified
→ Crisis Manager activated
→ ECAB emergency change authorised
→ Failover decision: Partial or Full?
→ Partial: Failover only affected services
→ Full: Activate complete DR siteFailover Execution Steps
- Assess: Confirm scope — which services, which users, which sites
- Notify: Executive team, business stakeholders, ITSM team
- Activate: Execute DR runbook per affected service
- Verify: Validate each service meets RTO/RPO criteria after failover
- Communicate: User-facing status update (email, status page, SMS)
- Monitor: Heightened monitoring on DR environment
DR Test Types
| Test Type | Description | Frequency | Disruption |
|---|---|---|---|
| Tabletop Exercise | Scenario walkthrough with key stakeholders | Quarterly | None |
| Component Test | Test failover of a single system (e.g. database) | Bi-annual | Minimal |
| Simulation Test | Full scenario simulation without actual failover | Annual | Low |
| Full Failover Test | Complete failover to DR site; verify all services | Annual | Planned window |
| Unannounced Test | Surprise test of response capability | Ad hoc | Medium |
DR Test Report Structure
| Section | Content |
|---|---|
| Test type and date | Full failover test, 2026-03-15 |
| Services tested | ERP, Email, ITSM Portal |
| RTO target vs. actual | Target 4h / Actual 3h 45min ✅ |
| RPO target vs. actual | Target 1h / Actual 35min ✅ |
| Gaps identified | DNS propagation took 45 min (target: 15 min) |
| Action items | Automate DNS failover (owner: Network, due: 2026-04-30) |
Cloud DR Strategies
AWS Disaster Recovery
| Strategy | RTO | RPO | Cost |
|---|---|---|---|
| Backup & Restore | Hours | Hours | $ |
| Pilot Light | 30–60 min | Minutes | $$ |
| Warm Standby | Minutes | Seconds | $$$ |
| Multi-Site Active-Active | Near-zero | Near-zero | $$$$ |
Recommended tools: AWS Elastic Disaster Recovery (DRS), S3 Cross-Region Replication, Route 53 Health Checks, RDS Multi-AZ.
Azure Disaster Recovery
- Azure Site Recovery (ASR): Continuous replication of VMs to secondary region
- Azure Backup: Geo-redundant vault for data backup
- Traffic Manager: Automatic DNS-based failover to secondary region
- Availability Zones: Near-zero RTO for zone-redundant deployments
Multi-Cloud DR Considerations
- Ensure application layer is cloud-agnostic (containers, Kubernetes)
- Test cross-cloud networking and latency before declaring strategy viable
- Govern with a single DR orchestration tool (Zerto, Veeam, CloudEndure)
KPIs
| Metric | Target |
|---|---|
| DR plan coverage (% of Tier 1/2 services) | 100% |
| DR test frequency (annual) | ≥ 1 full test per year |
| DR test success rate | > 95% of services meet RTO/RPO |
| RTO compliance (during actual DR event) | 100% |
| DR plan last reviewed | < 12 months ago |
| Action items from last test (closed) | > 90% |
Downloadable Resources
| Resource | Format | Download |
|---|---|---|
| DR Asset Register | Excel | ⬇ Download |
| Disaster Recovery Plan | Word | ⬇ Download |
← Back to Disaster Recovery