The StatusCat blog
How to set up an on-call rotation
A practical guide to on-call rotations: schedules, escalation, acknowledgements, quiet hours and alert hygiene — so incidents reach the right person fast without exhausting your team.
On-call is where monitoring meets people. You can have perfect checks, but if the alert reaches no one — or reaches an exhausted engineer at 3 AM for something that could have waited — the system has failed. A good on-call rotation gets the right alert to the right person at the right time, and protects your team while doing it.
The building blocks
- A schedule. Who is responsible, and when. Rotate weekly (or daily for larger teams) so the load is shared and predictable.
- A primary and a backup. The primary responds first; the backup exists so a missed page doesn't become a missed incident.
- Escalation. If the primary doesn't acknowledge within, say, 5–10 minutes, the alert automatically escalates to the backup, then to a manager. This is the single most important safety net.
- Acknowledgement. Responders acknowledge an alert to stop escalation and signal "I've got this."
- Quiet hours and severity. Not everything deserves a phone call at 3 AM. Route urgent, user-impacting alerts to a call or SMS; send everything else to a channel people read during the day.
A simple, sane setup
- Rotation: weekly, primary + secondary.
- Escalation: primary → (5 min no ack) → secondary → (10 min no ack) → team lead.
- Channels by severity:
- Critical (site/API down): phone call or SMS, immediately.
- Warning (elevated latency, cert expiring soon): Slack/Discord, business hours.
- Quiet hours: non-critical monitors don't page overnight.
Protecting your team from burnout
On-call burnout is real, and it usually comes from noise, not volume of real incidents. To keep it sustainable:
- Only page on things a human must act on now. Everything else is a daytime notification, not a page.
- Re-check before alerting so transient blips never wake anyone. (StatusCat does this automatically.)
- Tune relentlessly. Every false page is a bug in your alerting — fix it, don't tolerate it.
- Keep rotations fair and short, and make handoffs explicit.
- Track the load. If one person is getting paged far more, something is wrong upstream.
Setting it up in StatusCat
StatusCat has on-call built in, so you don't need a separate paging tool:
- Create a schedule and add your responders with their rotation.
- Define an escalation policy — primary, wait N minutes for acknowledgement, then escalate.
- Attach it to your monitors and choose channels by severity.
- Set quiet hours for non-critical monitors.
That's a complete, humane on-call setup. Pair it with well-tuned checks — see what uptime monitoring is and how to monitor an API — and your team hears about the incidents that matter, and only those. It's included free for 50 monitors.