How Small Teams Should Coordinate During an Incident
Assigning roles is the first step of incident coordination — deciding who investigates and who communicates (I wrote about that here). But roles only cover the static picture. Once the incident is live, the coordination gets dynamic. Someone posts an update. Someone else has new information thirty seconds later. The engineer discovers the scope is bigger than anyone thought. The poster needs to step away. Two people both go to update the status page at the same time.
These are the real-time coordination problems that make incident communication harder than it looks. Here's how to handle them.
The duplicate-update problem
The most visible coordination failure is two people posting overlapping updates within seconds of each other. A customer refreshes the page and sees two distinct "Investigating" posts from different authors, each with slightly different wording. It reads as the team flailing — multiple people trying to help but no one actually steering.
This happens because the decision to post is individual. One person sees new information in the incident channel and posts. Another person sees the same information and also posts. Neither checked whether the other was about to do the same thing. The fix is a coordination signal that takes one second.
Pick a convention that works for your team. A quick message in the incident channel before posting — "posting update now" — is enough. It tells everyone else to hold for sixty seconds. If your status page tool shows who's actively editing an incident, that's even better. The mechanism matters less than the habit: signal before you post, so nobody's posting on top of you.
The same convention applies to editing an existing update. If the poster is refining language on a published post, a "tweaking the monitoring update" note in the channel prevents the engineer from opening the same incident and overwriting the changes.
The information gap between investigator and poster
In the Slack-relay pattern — engineer investigates and drops one-liners into a shared channel, poster translates them into customer-facing updates — the coordination works until the flow gets uneven. The engineer is deep in a debugging session and goes silent for twenty minutes. The poster sees nothing new in the channel and hesitates to post a "still investigating" update because maybe the engineer is about to surface with a major discovery.
This hesitation is the gap where updates die. The poster is waiting for the engineer. The engineer assumes the poster will post on schedule regardless. Neither explicitly handshakes, so nothing gets posted.
The fix is to decouple the poster's cadence from the engineer's output. The poster's job is to post on the agreed schedule whether or not new information has arrived. A "still investigating, no new findings" update is a valid update. It shows customers the team is still working. The poster doesn't need the engineer's permission to post it — the pre-agreed schedule is the permission.
The engineer's job is to drop facts into the channel whenever they surface. The poster's job is to maintain the cadence. These two jobs run independently, and that independence is what keeps the page updated when the investigation hits a quiet patch.
The handoff
Incidents that last more than a few hours need a handoff. The person posting updates goes to sleep, or takes a break, or gets pulled into the investigation. If the handoff isn't explicit, the customer-facing communication pauses — and the pause reads as the team losing focus.
A handoff has three parts:
The outgoing poster leaves a state summary. Who has the incident open, what was the last update, what time is the next update due, and what's the current status of the investigation. This doesn't need to be a formal document — a few lines in the incident channel are enough. "I'm handing off. Last update was a monitoring post at 2:15. Next update due by 2:45. Engineer says fix is deploying now but not confirmed yet."
The incoming poster acknowledges the handoff. A quick "got it, I'm on updates until 5" confirms the baton was caught. No acknowledgement from the incoming poster means the outgoing poster doesn't actually know the handoff succeeded — and might stick around longer than they should, burning out while holding a role nobody asked them to hold.
The incoming poster posts immediately. A quick "we're continuing to monitor the fix deployment, next update by 2:45" tells customers that someone new is on the job, and it confirms to the outgoing poster that the handoff is real. The incoming poster doesn't have to wait for something new to happen to post — the handoff itself is the reason to post.
Handoffs that miss any of these three steps look like the team losing continuity. The customer sees an update gap where the outgoing poster stopped and the incoming poster hadn't started yet. The handoff itself creates exactly the communication gap it's supposed to prevent.
When two people post with inconsistent information
A subtler coordination failure: two people post about the same incident at different points in the update cycle, and the earlier post was based on stale information. The poster posted "login is the only affected component" at 3:10. At 3:12, the engineer discovers the database is also degraded and drops a note in the channel. At 3:13, the poster edits the update. But at 3:15, a second person — the founder, or support, who saw the 3:10 version — posts a follow-up that references "the login-only incident." Now the page has two updates that contradict each other about the scope.
The fix is a coordination rule: whoever is designated as the poster for this incident owns the status page until the handoff. Nobody else posts. If someone who isn't the poster wants to contribute information, they route it through the poster — either by dropping it in the incident channel for the poster to pick up, or by messaging the poster directly. The rule isn't about gatekeeping. It's about having a single point of truth on the page.
For teams of two or three, the "nobody else posts" rule might seem heavy. But it's simpler to enforce than the alternative — a free-for-all where anyone can post and the page becomes inconsistent — and it maps cleanly to the role split from the earlier who should post post. The designated poster is the designated poster. That designation means something.
The voice-consistency problem in practice
The earlier post covered the voice question — pick one voice for status page posts and have all posters match it. In practice, this is harder during a live incident than it sounds. The poster is writing fast. The engineer, filling in as poster during a handoff gap, defaults to the voice they actually have. The tone drifts.
The practical fix is to have the poster read the last two or three published updates before writing a new one. This takes fifteen seconds. It doesn't produce perfect consistency, but it's enough to keep the voice recognizable. A reader going from update to update won't feel tonal whiplash. If the previous updates used short, direct sentences and named the affected components concretely, the next update should too. If the previous updates avoided phrases like "elevated error rates" and instead used "customers are seeing errors when they try to sign in," the next update should use the same plain-language pattern.
The voice isn't something you define in a style guide (though that doesn't hurt). It's something you pick up by reading the last few posts and matching the cadence. The fifteen-second check is the mechanism.
The pre-incident setup
Most coordination problems are solved by setup that happens before the incident starts. Specifically:
Make sure everyone who might post has access. If the designated poster goes offline and the backup doesn't have status page access, the communication pauses while someone provisions an account. Time spent fixing access during an incident is time the status page isn't being updated.
Make sure everyone knows who the poster is and who the backup is. The decision from the earlier post — "Person A gathers facts, Person B writes and posts" — is the baseline. But it needs a fallback: "If Person B is unavailable, Person A does both." And it needs everyone on the team to know both names.
Set up the incident channel in advance. Whether it's a dedicated #incidents Slack channel or temporary per-incident channels, the channel should exist before the incident. Creating it during the incident wastes minutes. The channel name, the membership, and the convention for posting updates (plain text facts, not formatted) should all be pre-decided.
Define the update cadence before the incident. The cadence post covers the phases. What matters for coordination is that the poster knows the cadence without having to negotiate it with the engineer mid-incident. A pre-agreed schedule — "first update within 5 minutes, then every 15–30 minutes, including 'no new information' updates" — removes the need for the poster to ask "should I post now?" every fifteen minutes.
These decisions take ten minutes to make and write down. They save far more than ten minutes during every incident that follows.
The principle
Coordination during an incident isn't about having a complex process. It's about removing the friction that causes gaps. A signal before posting. An explicit handoff. A single poster per incident. A fifteen-second voice check before writing. These are small habits that compound — each one removes a few seconds of friction, and those seconds add up to minutes during an incident, and those minutes are what customers spend refreshing a silent status page.
The role assignment you did before the incident was the foundation. The coordination habits are the execution. Both are needed, and both cost almost nothing to put in place before the next incident.
PageCalm helps small teams run status pages with AI-powered incident updates that sound human and ship fast. Try it free — no credit card required.