Skip to main content
Back to blog
·PageCalm

What to Do in the First 5 Minutes of an Outage

incident managementstatus pagesincident communicationoutage response

An alert fires. Your product is down. Customers are hitting errors. What you do in the next five minutes determines whether this incident is a blip or a trust disaster.

Most outage playbooks are written for teams of 20 with dedicated on-call rotations and incident commanders. If you're a small team — or a solo founder who is the on-call rotation — you need something simpler.

Here's the five-minute checklist.

Minute 0-1: Confirm It's Real

Not every alert is a real outage. Before you do anything public:

  • Check the alert source. Is it a single probe timeout, or multiple signals confirming a real problem? One failed health check might be a blip. Three failed health checks across two regions is real.
  • Try the product yourself. Open your app in a browser. Hit the endpoint. Log in. Sometimes the monitoring tool is wrong. Sometimes it's right but the problem already resolved.
  • Check your deploy history. Did someone just push a change? A broken deploy is the most common cause of sudden outages, and the fix is often a rollback.

This takes 60 seconds. Do it before posting anything to your status page. The only thing worse than a slow status update is a status update you have to retract.

Minute 1-2: Post the First Update

You don't need to know the cause yet. You don't need a fix timeline. You just need to acknowledge the problem.

Post a status update:

"We're aware of issues affecting [component]. Our team is investigating. We'll provide an update within 15 minutes."

That's it. Three sentences. The goal isn't to explain what happened — it's to tell customers: we know, we're on it, we'll keep you posted.

What to get right:

  • Be specific about what's broken. "Issues affecting the API" is better than "experiencing issues." Customers need to know if the thing they use is the thing that's down.
  • Set a next-update time. "We'll update within 15 minutes" gives customers a reason to stop refreshing. It also commits you to providing another update, which is the point.
  • Mark the right components. If your API is down, don't mark your entire system as degraded. Mark the API as major outage and leave other components alone (unless they're also affected).

What to avoid:

  • Don't speculate on the cause. "We believe this is related to a database issue" will haunt you if it turns out to be something else.
  • Don't promise a resolution time. "We expect to resolve this within 30 minutes" is a contract. You don't have enough information yet.
  • Don't say "some users may be affected" if all users are affected. Understating the impact makes customers feel gaslit.

Minute 2-4: Start Investigating

Now — and only now — focus on the fix. The order matters because customers who saw your first update are now waiting calmly instead of flooding your support inbox.

The triage checklist:

  1. Recent changes. Check your deploy log. If something went out in the last hour, that's your top suspect. Can you roll it back?
  2. External dependencies. Check the status pages of services you depend on (database provider, DNS, CDN, payment processor). Sometimes the problem isn't yours.
  3. Error logs. What's the error? Is it one error repeating, or many different ones? A single repeated error usually means one root cause. Many different errors might mean something foundational (network, database) is down.
  4. Scale. Is it getting worse? Stable? Improving on its own? This informs whether you need to act urgently or can investigate methodically.

If you find the cause and can fix it quickly (rollback, config change, restart), do it. If the investigation is going to take longer, that's fine — you've already set expectations with your first update.

Minute 4-5: Decide What Happens Next

At this point you're either fixing the problem or still investigating. Either way, plan your communication cadence:

  • If you're fixing it: Post a second update: "We've identified the cause and are deploying a fix. We'll confirm resolution within [time]."
  • If you're still investigating: Plan your next update. You said 15 minutes, so in 15 minutes, post something — even if the update is "still investigating, no new information yet." Silence after promising an update is worse than a boring update.
  • If it resolved on its own: Post that too. "The issue has resolved. We're investigating the root cause to prevent recurrence." Don't just silently mark it operational — customers who saw the first update are waiting for closure.

What Not to Do

A few things that make outages worse, especially under time pressure:

Don't skip the status page and go straight to fixing. The fastest path to reducing support load is telling customers you know. Every minute you spend silently fixing is a minute customers spend writing "is it just me?" emails.

Don't post a status update on social media instead. Some customers will check social media. Most will check your status page. If you only have time for one, post to the status page. Subscribers get notified automatically.

Don't deploy a speculative fix without confidence. A failed fix that makes things worse creates a second incident on top of the first. If you're not sure the fix works, test it first or roll back to a known good state.

Don't forget to update. If you said "update in 15 minutes," update in 15 minutes. Set a timer on your phone. During an outage, time perception warps — what feels like five minutes is actually twenty. A timer keeps you honest.

After the Five Minutes

Once you're past the initial response, the outage follows a predictable lifecycle: investigating → identified → monitoring → resolved. Each phase has different communication needs, and we've written about those elsewhere.

But the first five minutes are where trust is won or lost. A quick acknowledgment, a clear scope, and a promised next update — that's the entire playbook. Everything else is execution.


PageCalm helps small teams run status pages with AI-powered incident updates that sound human and ship fast. Try it free — no credit card required.

Share

Stop wordsmithing during outages

PageCalm writes your incident updates so you can focus on fixing what's broken.

Get Started Free