Back to blog
·PageCalm

Incident Severity Levels: When to Use Minor, Major, and Critical

incident communicationstatus pagesseverity levelsbest practices

When something breaks, one of the first decisions you make is how bad it is. Minor? Major? Critical? Most teams get this wrong — not because they don't understand their systems, but because they're thinking about severity from the wrong perspective.

Severity isn't about how panicked your engineering team feels. It's about how much your customers are affected.

The Three Levels

Minor

What it means to customers: Something isn't quite right, but they can still do their work.

Minor incidents cover degraded performance, cosmetic issues, or problems affecting a small subset of users. The product still works — it's just slower, glitchy, or slightly broken in a way that doesn't block anyone's workflow.

Examples:

  • Dashboard loads 3x slower than usual
  • Search results are delayed by a few seconds
  • A non-critical feature (like export to CSV) is failing
  • Intermittent errors affecting less than 5% of requests

What to communicate: A brief update acknowledging the issue. Don't over-dramatize it — match the energy of the impact.

"Some users may notice slower load times on the dashboard. We've identified the cause and are working on a fix."

Major

What it means to customers: A key part of the product is broken for a significant number of users.

Major incidents are the most common — and the most commonly overused. This level means something important isn't working, and enough users are affected that your support inbox is going to feel it.

Examples:

  • API returning errors for a significant portion of requests
  • Login working intermittently
  • Webhook deliveries delayed by 10+ minutes
  • A core feature (like creating new records) is failing

What to communicate: Be specific about what's affected and what still works. Customers need to know if they should wait or find a workaround.

"We're experiencing issues with our REST API. Some requests are returning errors. The dashboard and webhooks are unaffected. Our team has identified the cause and is deploying a fix."

Critical

What it means to customers: The product is effectively down for everyone.

Critical incidents are total outages or security events. Nobody can use the product, or a significant security issue has been discovered. These should be rare — if you're classifying incidents as critical every week, something else is wrong.

Examples:

  • Complete outage — all requests failing
  • Data loss or corruption
  • Security breach affecting user data
  • Payment processing completely down

What to communicate: Acknowledge the severity directly. Don't sugarcoat it, but don't panic either. Set a frequent update cadence — every 15-30 minutes is a common target — and stick to it, even if the update is "still working on it."

"We are experiencing a complete service outage. All users are affected. Our engineering team is actively working to restore service. We will provide updates every 30 minutes until service is restored."

The Mistakes Teams Make

Everything Is "Major"

This is the most common problem. When every incident is major, the word loses meaning. Customers see "major" on a slow page load and "major" on a total outage and learn that your severity labels are meaningless.

If you find yourself defaulting to major every time, ask: "Can customers still do their core work?" If yes, it's probably minor. If the answer is "no, and it's everyone," it's probably critical.

Severity Based on Root Cause, Not Impact

A database failover sounds scary to your engineering team. But if the failover completed in 8 seconds and users saw a brief loading spinner, it's minor — regardless of how alarming the internal alert was.

Conversely, a "simple" configuration change that breaks login for all users is critical, even if the fix takes two minutes.

Severity should reflect what customers experience, not what's happening in your infrastructure.

Not Escalating (or De-escalating) During an Incident

Incidents evolve. Something that starts as minor — "elevated error rates on a single endpoint" — can escalate to major or critical as the scope becomes clear.

Don't lock yourself into the initial severity. If an incident gets worse, update the severity and post a new update explaining the change. Customers respect transparency about evolving situations far more than they respect consistency with an initial assessment that turned out to be wrong.

The reverse is also true: if a major incident turns out to affect fewer users than initially thought, de-escalate and say so.

Why Severity Matters for Customer Trust

Getting severity right isn't just operational hygiene. It directly affects how customers perceive your reliability.

Accurate severity builds calibration. When customers see your minor incidents are genuinely minor and your critical incidents are genuinely critical, they learn to trust your labels. The next time they see "minor incident" on your status page, they know they can keep working. That's a powerful signal — it means your status page is actually useful, not just noise.

Over-classifying erodes trust. If every hiccup is "major," customers either panic unnecessarily or learn to ignore your status page entirely. Both are bad.

Under-classifying erodes trust faster. If a total outage is labeled "minor," customers feel gaslit. They can see the product is down. They know it's not minor. Labeling it that way tells them you're either dishonest or disconnected from their experience.

A Simple Decision Framework

When an incident occurs, ask these three questions in order:

  1. Can most customers complete their core workflow?

    • Yes → Minor
    • No → Go to question 2
  2. Is a significant portion of the product broken, or is it everything?

    • Significant portion → Major
    • Everything / near-everything → Go to question 3
  3. Is there data loss, a security issue, or a total outage?

    • Yes → Critical
    • No → Major

That's it. Three questions, three outcomes. When in doubt, err toward the higher severity — customers forgive over-communication far more easily than under-communication.


PageCalm helps small teams run status pages with AI-powered incident updates that sound human and ship fast. Try it free — no credit card required.

Share

Stop wordsmithing during outages

PageCalm writes your incident updates so you can focus on fixing what's broken.

Get Started Free