Tools · Software engineering
SLO Definer
Describe a service — get user-centric SLIs, SLOs, and an error-budget policy.
How it works
Corpus-grounded (Google SRE via the software-engineering cluster). Chooses SLIs that reflect real user experience, sets SLOs (target + window), an error-budget policy (what happens when it's spent), and burn-rate alerting guidance — targets tied to user need, not 100%.
You bring
{ service, cluster? }
You get
{ service_summary, slis[]{sli, definition, measurement}, slos[]{objective, target, window}, error_budget_policy, alerting_notes[], riskiest_assumptions[], grounded_in, provenance }
Use it for
- →SWE-guide reader: set reliability targets users actually feel
- →Define the error-budget policy before the next incident
- →Alert on burn rate, not every blip
See it work
example outputService: a payments checkout API that authorizes card transactions for online merchants.
SLOs — payments checkout API
Service: authorizes card transactions for online merchants; user-facing impact is a shopper trying to pay. Grounded in Google SRE (software-engineering cluster). Targets tied to user experience, not 100%.
SLIs (what the user actually feels)
| SLI | Definition | Measurement |
|---|---|---|
| Availability | % of authorize requests that return a valid response (not 5xx / timeout) | Ratio of good : total at the load balancer |
| Latency | % of authorize requests served < 800ms | Server-side request-duration histogram, p-based |
| Correctness | % of authorizations with no double-charge / wrong-amount | Reconciliation against settlement ledger |
SLOs (target + window)
- Availability: 99.95% over a rolling 28-day window.
- Latency: 99% of requests < 800ms over 28 days.
- Correctness: 99.999% over 28 days (money is unforgiving).
Error-budget policy
The 99.95% availability target allows ~21 minutes/month of unavailability.
- Budget remaining: ship features normally.
- Budget < 25%: freeze risky launches; redirect to reliability work.
- Budget exhausted: change freeze except reliability/security fixes until the window recovers; blameless review required.
Alerting notes
- Alert on burn rate, not every blip: page on a fast burn (2% of budget in 1 hour) and a slow burn (10% in 6 hours).
- Multi-window alerts to cut false pages; correctness violations page immediately regardless of budget.
Riskiest assumptions
- That 800ms reflects the shopper's real checkout patience.
- That settlement reconciliation is timely enough to catch correctness misses fast.
Run it now
Define SLOs & SLIs
Turn a service into reliability targets — the SLIs that reflect user experience, SLOs (target + window), an error-budget policy, and alerting guidance.
Prefer code? Call it over the API or hand it to your AI agent via MCP — POST /api/bicycle/slo-definer · define_slos. API & agent access →