The StatusCat blog

How to set up an on-call rotation

A practical guide to on-call rotations: schedules, escalation, acknowledgements, quiet hours and alert hygiene — so incidents reach the right person fast without exhausting your team.

StatusCatJun 16, 20263 min read

On-call is where monitoring meets people. You can have perfect checks, but if the alert reaches no one — or reaches an exhausted engineer at 3 AM for something that could have waited — the system has failed. A good on-call rotation gets the right alert to the right person at the right time, and protects your team while doing it.

The building blocks

A schedule. Who is responsible, and when. Rotate weekly (or daily for larger teams) so the load is shared and predictable.
A primary and a backup. The primary responds first; the backup exists so a missed page doesn't become a missed incident.
Escalation. If the primary doesn't acknowledge within, say, 5–10 minutes, the alert automatically escalates to the backup, then to a manager. This is the single most important safety net.
Acknowledgement. Responders acknowledge an alert to stop escalation and signal "I've got this."
Quiet hours and severity. Not everything deserves a phone call at 3 AM. Route urgent, user-impacting alerts to a call or SMS; send everything else to a channel people read during the day.

A simple, sane setup

Rotation: weekly, primary + secondary.
Escalation: primary → (5 min no ack) → secondary → (10 min no ack) → team lead.
Channels by severity:
- Critical (site/API down): phone call or SMS, immediately.
- Warning (elevated latency, cert expiring soon): Slack/Discord, business hours.
Quiet hours: non-critical monitors don't page overnight.

Protecting your team from burnout

On-call burnout is real, and it usually comes from noise, not volume of real incidents. To keep it sustainable:

Only page on things a human must act on now. Everything else is a daytime notification, not a page.
Re-check before alerting so transient blips never wake anyone. (StatusCat does this automatically.)
Tune relentlessly. Every false page is a bug in your alerting — fix it, don't tolerate it.
Keep rotations fair and short, and make handoffs explicit.
Track the load. If one person is getting paged far more, something is wrong upstream.

Setting it up in StatusCat

StatusCat has on-call built in, so you don't need a separate paging tool:

Create a schedule and add your responders with their rotation.
Define an escalation policy — primary, wait N minutes for acknowledgement, then escalate.
Attach it to your monitors and choose channels by severity.
Set quiet hours for non-critical monitors.

That's a complete, humane on-call setup. Pair it with well-tuned checks — see what uptime monitoring is and how to monitor an API — and your team hears about the incidents that matter, and only those. It's included free for 50 monitors.

Frequently asked questions

What is an on-call rotation?

An on-call rotation is a schedule that decides who is responsible for responding to incidents at any given time. It rotates among team members so no one person is always on the hook, and it usually includes escalation to a backup if the primary doesn't respond.

How does alert escalation work?

When an alert fires, it goes to the primary on-call responder. If they don't acknowledge it within a set time, it automatically escalates to the next person (a secondary or manager), so an incident is never missed because someone was asleep or away.

How do I avoid on-call burnout?

Reduce alert noise ruthlessly (only page on things that need a human now), use quiet hours for non-urgent alerts, keep rotations short and fair, and make sure people can hand off cleanly. Alert fatigue is the fastest way to lose trust in your monitoring.

Keep reading

BlogWhat is uptime monitoring? A practical guideUptime monitoring explained in plain terms: how it works, why it matters, the check types you'll use, and how to set it up so you hear about outages before your customers do.BlogUptime percentages explained: how much downtime is 99.9%?What 99.9%, 99.95% and 99.99% uptime actually mean in minutes and hours of allowed downtime — per day, month and year — plus how to pick the right target for your service.BlogWhat is a status page, and does your product need one?A status page tells your users what's working and what isn't. Here's what a status page is, why it reduces support load and builds trust, and what makes a good one.