peopleanalyst

Tools · Software engineering

System Design Review

Describe a design — get a senior design review with risks + recommendations.

How it works

Corpus-grounded (software-engineering cluster). A staff-level review: strengths, risks (severity + mitigation), scalability bottlenecks, failure modes, the open questions the design hasn't answered, and prioritized recommendations.

You bring

{ design, cluster? }

You get

{ design_summary, strengths[], risks[]{risk, severity, mitigation}, scalability_notes[], failure_modes[], open_questions[], recommendations[], riskiest_assumptions[], grounded_in, provenance }

Use it for

See it work

example output

Design: "a URL-shortener service expecting ~50M redirects/day with custom vanity aliases and click analytics."

System Design Review — URL Shortener (50M redirects/day)

Design under review: A URL-shortening service: a write path to mint short codes (random + custom vanity aliases), a high-volume redirect read path (~50M/day), and an async pipeline aggregating click analytics. Proposed stack: stateless API behind a load balancer, a relational primary store, and a cache in front of redirects.

Strengths

  • Read/write split is correct: redirects vastly outnumber creates, and the design treats them as different paths.
  • Caching the code→URL mapping is the single highest-leverage choice for redirect latency.
  • Analytics is async — click logging is decoupled from the redirect hot path.

Risks

RiskSeverityMitigation
Vanity-alias collisions / contention on a uniqueness checkhighEnforce uniqueness at the DB with a unique index; reserve a separate namespace for vanity vs. generated codes
Cache stampede on a viral link expirymediumAdd jittered TTLs + request coalescing (single-flight) on cache miss
Hot-key skew (one link dominates traffic)mediumPromote hot keys to an edge/CDN layer; per-key rate awareness

Scalability notes

  • The redirect path should terminate as close to the edge as possible; the origin should rarely be hit for popular links.
  • Code generation must avoid a global counter bottleneck — prefer random base-62 with collision-retry, or a sharded/range-allocated scheme.
  • Analytics writes will dominate row growth; partition or roll up early.

Failure modes

  • Cache cluster loss → full read load lands on the primary DB (likely overload). Blast radius: all redirects.
  • Analytics pipeline backpressure must never block or fail a redirect.

Open questions

  • What is the redirect latency SLO, and is eventual-consistency on click counts acceptable?
  • Retention/TTL policy for links and for analytics raw events?
  • Is custom-domain (CNAME) support in scope? It changes the routing layer materially.

Recommendations (prioritized)

  1. Specify the redirect SLO and design the cache hit-rate target backward from it.
  2. Make the redirect path independent of analytics availability (fire-and-forget with a durable queue).
  3. Decide the code-generation scheme now — it's expensive to change later.

Riskiest assumptions

  • That 50M/day is uniformly distributed (it won't be — plan for hot keys and spikes).
  • That a single relational primary can absorb cache-loss read fallback.

Grounded in the software-engineering cluster (scalability, failure-mode, and SRE review practice).

Run it now

Review a system design

Get a senior design review — strengths, risks (with severity + mitigation), scalability bottlenecks, failure modes, open questions, and prioritized recommendations.

Prefer code? Call it over the API or hand it to your AI agent via MCP — POST /api/bicycle/system-design-review · review_system_design. API & agent access →

← All tools