⚙️ Process & Workflows — Incident Management
Standard Incident Process
ITIL 4 Incident Management Workflow
Click any step to expand · 7 steps
1
📥Detection & Logging
2
🏷️Classification & PrioritisationDECISION
3
🔍Initial Diagnosis (L1)
4
⬆️Escalation (if needed)DECISION
5
🔬Investigation & Diagnosis
6
✅Resolution & Recovery
7
🔒Closure
Major Incident (P1) Process
Activation Criteria
- Complete outage of a business-critical service
-
50 users impacted simultaneously
- Revenue-impacting service degradation
- Security breach with service impact
- Declared by Incident Manager or Service Owner
Bridge Call Protocol
| Timeline | Action |
|---|---|
| T+0 | Bridge opened; all technical resolvers join |
| T+15 | First executive update sent |
| T+30 | Status page / Slack updated |
| T+60 | Executive escalation if no resolution path |
| Every 30 min | Progress updates to stakeholders |
| Resolution | All-clear communication to users |
Communication Templates
Initial Major Incident Notification:
Subject: [P1 MAJOR INCIDENT] — {Service} Outage — IN{ticket_id}
We are currently experiencing a major incident affecting {service}.
Impact: {description of impact}
Users affected: {scope}
Incident Manager: {name}
Bridge: {link or dial-in}
Next update: {time}Resolution Communication:
Subject: [RESOLVED] — {Service} — IN{ticket_id}
The {service} incident has been resolved.
Resolution time: {time}
Duration: {duration}
Resolution summary: {brief description}
A Post-Incident Review will be scheduled within 48 hours.Post-Incident Review (PIR)
PIR is mandatory for all P1 incidents and P2 incidents with > 2 hour resolution time.
PIR Agenda (60 minutes)
- Timeline reconstruction (10 min) — What happened, when?
- Impact assessment (5 min) — Users, revenue, SLA
- Root cause analysis (20 min) — 5-Whys, Fishbone, or fault tree
- Contributing factors (10 min) — Process gaps, tooling, communication
- Action items (10 min) — Preventive controls, knowledge articles, process updates
- Lessons learned (5 min) — What went well, what to improve
PIR Output Template
| Section | Content |
|---|---|
| Incident ID | IN000001 |
| Date / Duration | 2026-05-07, 3h 22min |
| Impact | 450 users, €12,000 estimated loss |
| Root Cause | Database connection pool exhausted |
| Contributing Factors | Missing alerting threshold, no runbook |
| Action Items | Alert rule created (owner: Ops, due: 14/05) |
| Lessons Learned | Add connection pool metrics to dashboards |
AI-Assisted Incident Management (ITIL 5)
| Capability | Benefit |
|---|---|
| Auto-classification | ML classifies ticket in < 2 seconds vs. manual 3–5 min |
| Intelligent routing | Routes to correct team with 90%+ accuracy |
| Similar incident detection | Surfaces related open incidents for correlation |
| Suggested KB articles | Presents top 3 resolution articles at creation |
| Predictive SLA breach | Alerts manager 30 min before SLA breach |
| Auto-resolution (low complexity) | Password resets, account unlocks handled without agent |
KPIs
| Metric | Target |
|---|---|
| Mean Time to Detect (MTTD) | < 5 min (monitoring-detected) |
| Mean Time to Respond (MTTR) | P1: < 15 min, P2: < 1 hour |
| Mean Time to Resolve (MTTR) | P1: < 4 hours, P2: < 8 hours |
| First Contact Resolution | > 75% |
| SLA compliance | > 95% |
| CSAT score | > 4.2 / 5 |
| Repeat incidents (< 30 days) | < 5% |
Downloadable Resources
| Resource | Format | Download |
|---|---|---|
| Incident Report | Excel | ⬇ Download |
| Incident Procedure | Word | ⬇ Download |
← Back to Incident Management