Tools · Software engineering
System Design Review
Describe a design — get a senior design review with risks + recommendations.
How it works
Corpus-grounded (software-engineering cluster). A staff-level review: strengths, risks (severity + mitigation), scalability bottlenecks, failure modes, the open questions the design hasn't answered, and prioritized recommendations.
You bring
{ design, cluster? }
You get
{ design_summary, strengths[], risks[]{risk, severity, mitigation}, scalability_notes[], failure_modes[], open_questions[], recommendations[], riskiest_assumptions[], grounded_in, provenance }
Use it for
- →SWE-guide reader: a design-doc review before the build starts
- →Find the failure modes + scalability bottlenecks early
- →Get the open questions the design glosses over
See it work
example outputDesign: "a URL-shortener service expecting ~50M redirects/day with custom vanity aliases and click analytics."
System Design Review — URL Shortener (50M redirects/day)
Design under review: A URL-shortening service: a write path to mint short codes (random + custom vanity aliases), a high-volume redirect read path (~50M/day), and an async pipeline aggregating click analytics. Proposed stack: stateless API behind a load balancer, a relational primary store, and a cache in front of redirects.
Strengths
- Read/write split is correct: redirects vastly outnumber creates, and the design treats them as different paths.
- Caching the code→URL mapping is the single highest-leverage choice for redirect latency.
- Analytics is async — click logging is decoupled from the redirect hot path.
Risks
| Risk | Severity | Mitigation |
|---|---|---|
| Vanity-alias collisions / contention on a uniqueness check | high | Enforce uniqueness at the DB with a unique index; reserve a separate namespace for vanity vs. generated codes |
| Cache stampede on a viral link expiry | medium | Add jittered TTLs + request coalescing (single-flight) on cache miss |
| Hot-key skew (one link dominates traffic) | medium | Promote hot keys to an edge/CDN layer; per-key rate awareness |
Scalability notes
- The redirect path should terminate as close to the edge as possible; the origin should rarely be hit for popular links.
- Code generation must avoid a global counter bottleneck — prefer random base-62 with collision-retry, or a sharded/range-allocated scheme.
- Analytics writes will dominate row growth; partition or roll up early.
Failure modes
- Cache cluster loss → full read load lands on the primary DB (likely overload). Blast radius: all redirects.
- Analytics pipeline backpressure must never block or fail a redirect.
Open questions
- What is the redirect latency SLO, and is eventual-consistency on click counts acceptable?
- Retention/TTL policy for links and for analytics raw events?
- Is custom-domain (CNAME) support in scope? It changes the routing layer materially.
Recommendations (prioritized)
- Specify the redirect SLO and design the cache hit-rate target backward from it.
- Make the redirect path independent of analytics availability (fire-and-forget with a durable queue).
- Decide the code-generation scheme now — it's expensive to change later.
Riskiest assumptions
- That 50M/day is uniformly distributed (it won't be — plan for hot keys and spikes).
- That a single relational primary can absorb cache-loss read fallback.
Grounded in the software-engineering cluster (scalability, failure-mode, and SRE review practice).
Run it now
Review a system design
Get a senior design review — strengths, risks (with severity + mitigation), scalability bottlenecks, failure modes, open questions, and prioritized recommendations.
Prefer code? Call it over the API or hand it to your AI agent via MCP — POST /api/bicycle/system-design-review · review_system_design. API & agent access →