Response-time decay support: the queue-aging curve explained

Q: Should I report SLA attainment or percentile response times?

Report both. SLA attainment ('92% met the 2-hour target') tells executives the policy is mostly holding. The p90 first-response time tells them how bad the misses get. Reporting only attainment hides structural problems; reporting only percentiles loses the SLA narrative. The two metrics answer different questions and belong on the same dashboard.

Response-time decay is the curve you get when you plot the percentile distribution of first-response times across your ticket queue — p50, p75, p90, p99 — instead of reporting a single average. It exists because averages lie. A 2-hour mean first-response time can hide a p90 of 9 hours, and those long-tail tickets are the ones generating angry replies, CSAT downvotes, and churn risk on your B2B SaaS support team. The decay curve is the diagnostic that tells you which SLA breaches actually matter.

Key takeaways

Response-time decay plots first-response time at the p50, p75, p90, and p99 of your queue — the shape of that curve reveals long-tail damage that averages and SLA-met percentages cannot.
Average first-response time on a 5-15 agent B2B SaaS team typically understates the worst customer experiences by 3-5x compared to the p90.
A 2-hour SLA that reports 92% attainment can still produce a p90 of 4-6 hours, which is the cohort generating most of your negative CSAT.
Triaging the decay curve by topic, channel, and time-of-day exposes structural causes (shift gaps, complex topic routing) rather than agent-level blame.
Reporting the p90 alongside the average is the single highest-leverage change a support ops manager can make to SLA reviews.

What response-time decay actually measures

Response-time decay is the rate at which response times get worse as you walk up the percentiles of your ticket queue. If you sort every ticket from this month by how long it waited for a first agent reply, the median (p50) sits in the middle, the p90 is the ticket where 90% of tickets were faster, and the p99 is your worst 1%. Plot those four points and you get the decay curve.

A healthy curve is shallow: p50 at 30 minutes, p90 at 90 minutes, p99 at 3 hours. A broken curve hockey-sticks: p50 at 30 minutes, p90 at 6 hours, p99 at 24 hours. Same average. Wildly different customer experience.

The term "decay" captures the asymmetry — you can't make the fast tickets meaningfully faster, but you can let the slow ones get arbitrarily worse. The curve quantifies how badly.

Why 2-hour SLA attainment lies

Most B2B SaaS support teams report SLA performance as a percentage: "92% of tickets met the 2-hour first-response target this month." That number is misleading in two specific ways.

First, it treats every breach equally. A ticket answered in 2 hours 5 minutes and a ticket answered in 14 hours both count as one breach. To the customer, those are entirely different events — one is invisible, the other ends in a Slack message to your sales rep.

Second, it hides distribution shape. A team with a p90 of 1:55 hits 92% attainment. A team with a p90 of 7 hours also hits 92% attainment if their median is fast enough. Identical SLA reports, opposite CSAT outcomes.

The fix is reporting two numbers together: SLA attainment AND the p90 (or p95) first-response time. The first tells you the policy is mostly working. The second tells you how bad it gets when it doesn't.

Reading the decay curve: what each percentile tells you

Different percentiles answer different operational questions.

Percentile	What it represents	What changes it
p50 (median)	Typical ticket experience	Agent capacity, macro coverage, routing speed
p75	The edge of "normal"	Volume spikes, topic complexity
p90	The angry-customer cohort	Shift coverage gaps, escalation handoffs
p99	Crisis tickets	Tickets that slipped a queue entirely

For a 5-15 agent B2B SaaS team, the p90 is the percentile that correlates most tightly with negative CSAT. It captures the experience of one in ten customers — large enough to matter, small enough to be invisible in averages.

The p99 is more diagnostic than operational. A bad p99 almost always means a process gap: tickets assigned to an off-shift agent, a topic routing rule pointing at a disabled group, an email alias dropping into the wrong inbox. You fix p99s by walking individual tickets, not by hiring agents.

How to build the curve from your helpdesk data

If your tool exposes per-ticket firstResponseAt and createdAt timestamps — most do — the calculation is straightforward.

Pull every ticket created in the reporting window with a non-null first-response timestamp.
Compute firstResponseAt - createdAt for each ticket, in minutes.
Sort the resulting array ascending.
Read p50 at index 0.5 * length, p75 at 0.75 * length, p90 at 0.90 * length, p99 at 0.99 * length.
Plot the four values on a line chart (x-axis = percentile, y-axis = minutes).

Then repeat the same calculation sliced by:

Channel (email vs chat vs portal) — chat decay should be steep and short; email decay tells you about your async queue health
Topic — long tails on "Billing" or "API" usually point to specialist bottlenecks
Hour of creation — tickets created at 6pm local time in a team with no overnight coverage will dominate the p90
Priority — your Urgent decay curve should look fundamentally different from your Normal one

Diagnosing what the curve is telling you

The shape of the gap between p50 and p90 maps to structural problems, not agent performance.

Steep gap (p90 is 5-10x p50): Coverage problem. Tickets created during a shift gap sit for hours while the daytime queue handles new arrivals fast. Fix: extend hours, add a follow-the-sun agent, or set explicit out-of-hours auto-responses that reset customer expectations.

Flat gap (p90 is 1.5-2x p50): Healthy. Your queue is uniformly fast and your SLA is honest.

Bimodal curve (two clusters): Routing problem. Some tickets get picked up immediately; others languish until someone notices them. Fix: review topic-to-group rules, check for tickets stuck in unassigned states, audit your automation triggers.

Long flat tail at p95-p99: Process gap. A small cohort of tickets is consistently 8+ hours late. These are almost always tickets that fell out of every view — wrong group, wrong status, suppressed sender. Walk them individually.

How Helptal fits in

Helptal stamps firstResponseAt on every ticket automatically and exposes the percentile distribution in response-time reports — so the decay curve isn't a SQL project, it's a tab. Pair that with SLA policies that respect business hours and a breach engine that fires events at p75 thresholds rather than only at the final deadline, and you can intervene on long-tail tickets before they hit p90. For teams importing historical data from Zendesk or Help Scout, response timestamps are preserved so the curve is queryable from day one.

Frequently asked questions

What is a good p90 first-response time for B2B SaaS support?

For a team with a 2-hour business-hours SLA, a healthy p90 sits around 1.5-2x your SLA target — roughly 3-4 hours. Above 5 hours, you're producing a meaningful cohort of dissatisfied customers each week even if your headline SLA attainment looks fine. The exact target depends on your customer contract terms and the criticality of your product, but the ratio between p50 and p90 matters more than the absolute number.

How is response-time decay different from average first-response time?

Average first-response time collapses your entire queue into one number, which masks the long tail. Response-time decay plots multiple percentiles (p50, p75, p90, p99), exposing the shape of the distribution. A team can have a great average and a terrible p90 simultaneously — and the p90 is what drives CSAT damage, not the average.

Should I report SLA attainment or percentile response times?

Report both. SLA attainment ("92% met the 2-hour target") tells executives the policy is mostly holding. The p90 first-response time tells them how bad the misses get. Reporting only attainment hides structural problems; reporting only percentiles loses the SLA narrative. The two metrics answer different questions and belong on the same dashboard.

Does response-time decay apply to live chat the same way?

The principle applies, but the scale is different. Chat decay curves should be measured in seconds and minutes, not hours, and the p90 matters even more because chat customers are actively waiting. A chat p90 over 90 seconds is usually a coverage problem — not enough agents in Online presence at that hour.

How often should I recalculate the curve?

Weekly is enough for trending. Monthly is the minimum for SLA reviews. Daily is overkill unless you're actively running a remediation project on a known long tail. The curve is most useful when sliced by topic and time-of-day, which only stabilises over a few hundred tickets — for a 10-agent team, that's usually a week.

This week, pull your last 30 days of tickets, sort by first-response time, and write down your p50, p90, and p99. If the p90 is more than 3x the p50, you have a long-tail problem your average is hiding — and you now know where to start. If you're evaluating tooling that surfaces this without a data export, Helptal's free plan includes per-ticket response timestamps and built-in percentile reporting.

Key takeaways

Response-time decay plots first-response time at the p50, p75, p90, and p99 of your queue — the shape of that curve reveals long-tail damage that averages and SLA-met percentages cannot.
Average first-response time on a 5-15 agent B2B SaaS team typically understates the worst customer experiences by 3-5x compared to the p90.
A 2-hour SLA that reports 92% attainment can still produce a p90 of 4-6 hours, which is the cohort generating most of your negative CSAT.
Triaging the decay curve by topic, channel, and time-of-day exposes structural causes (shift gaps, complex topic routing) rather than agent-level blame.
Reporting the p90 alongside the average is the single highest-leverage change a support ops manager can make to SLA reviews.

What response-time decay actually measures

The term "decay" captures the asymmetry — you can't make the fast tickets meaningfully faster, but you can let the slow ones get arbitrarily worse. The curve quantifies how badly.

Why 2-hour SLA attainment lies

Most B2B SaaS support teams report SLA performance as a percentage: "92% of tickets met the 2-hour first-response target this month." That number is misleading in two specific ways.

Reading the decay curve: what each percentile tells you

Different percentiles answer different operational questions.

Percentile	What it represents	What changes it
p50 (median)	Typical ticket experience	Agent capacity, macro coverage, routing speed
p75	The edge of "normal"	Volume spikes, topic complexity
p90	The angry-customer cohort	Shift coverage gaps, escalation handoffs
p99	Crisis tickets	Tickets that slipped a queue entirely

How to build the curve from your helpdesk data

If your tool exposes per-ticket firstResponseAt and createdAt timestamps — most do — the calculation is straightforward.

Pull every ticket created in the reporting window with a non-null first-response timestamp.
Compute firstResponseAt - createdAt for each ticket, in minutes.
Sort the resulting array ascending.
Read p50 at index 0.5 * length, p75 at 0.75 * length, p90 at 0.90 * length, p99 at 0.99 * length.
Plot the four values on a line chart (x-axis = percentile, y-axis = minutes).

Then repeat the same calculation sliced by:

Channel (email vs chat vs portal) — chat decay should be steep and short; email decay tells you about your async queue health
Topic — long tails on "Billing" or "API" usually point to specialist bottlenecks
Hour of creation — tickets created at 6pm local time in a team with no overnight coverage will dominate the p90
Priority — your Urgent decay curve should look fundamentally different from your Normal one

Diagnosing what the curve is telling you

The shape of the gap between p50 and p90 maps to structural problems, not agent performance.

Flat gap (p90 is 1.5-2x p50): Healthy. Your queue is uniformly fast and your SLA is honest.

Response-time decay: the queue-aging curve your SLA report is hiding

by Helptal Editorial

Key takeaways

What response-time decay actually measures

Why 2-hour SLA attainment lies

Reading the decay curve: what each percentile tells you

How to build the curve from your helpdesk data

Diagnosing what the curve is telling you

How Helptal fits in

Frequently asked questions

What is a good p90 first-response time for B2B SaaS support?

How is response-time decay different from average first-response time?

Should I report SLA attainment or percentile response times?

Does response-time decay apply to live chat the same way?

How often should I recalculate the curve?

Share this post

Start with Helptal Free, free forever

Response-time decay: the queue-aging curve your SLA report is hiding

by Helptal Editorial

Key takeaways

What response-time decay actually measures

Why 2-hour SLA attainment lies

Reading the decay curve: what each percentile tells you

How to build the curve from your helpdesk data

Diagnosing what the curve is telling you

How Helptal fits in

Frequently asked questions

What is a good p90 first-response time for B2B SaaS support?

How is response-time decay different from average first-response time?

Should I report SLA attainment or percentile response times?

Does response-time decay apply to live chat the same way?

How often should I recalculate the curve?

Share this post

Start with Helptal Free, free forever