John Patrick Amata
← Back to blog

System Design 01

The basics

Basics of System Design

The “what”

Its about taking a set of requirements and breaking it down to solvable sub-problems, then decide the key components and responsibilities, decide the boundaries of each components, then the modules. Then think of how they interact. Consider all possible cases to give the user a good experience. Make the architecture fault tolerant and availabe and think about the challenges in scaling it

Approach

Go things top down if you get a big problem statement

  1. Grok the problem statement: what youre supposed to do, the constraints
  2. Break problem statement to (essential) components: determine components or features big enough to be microservice
    • noet: a component does not mean a microservice. it can just be a module inside a monolith, background worker, cache layer, DB table group, or just a separate service. think in terms of of breaking it into a responsiblity
  3. Dissect a component and go deeper: define clearer boundaries, sub-components, databases, servers, etc.
  4. For each sub-component dive deeper into the following dimensions:
    1. database and caching
    2. scaling and fault tolerance like what if the DB crashes
    3. async procssing/delegation like should we generate a feed asyncronously
    4. communication like how would these components/databases/services communicate to each other like are they using TCP/gRPC
  5. Split more components if needed: see if you can split generic services as doing so will make you understand if you have any missing components

Specificity

Ive mentioned sub-components to dive deeper into earlier, but be more specific, think in terms of

  • data model
  • API contracts
  • read/write patterns
  • consistency requirements
  • security/auth
  • observability
  • capacity estimates
  • cost and tradeoffs

Requirements

  1. FUnctional requirements
    • What the system does
  2. Non-functional
    • Latency
    • Throughput
    • Availability
    • Durability
    • Consistency
    • Security
    • Cist
  3. Scale
    • Users
    • RPS
    • Read/Write ratio
    • Storage growth
    • Geo notes

Cos simply just saying “build a social network” means a lot of things. Its much better if that also coame out with more specificity like the system will have 1M DAU (daily active users), that users should have features like reading message history or a profile, that it should tolerate crashes without losing messages

Evaluation

How do you know thata system is good enough and its time to stop wasting time on designing?

  1. Youve broken down the system to components
  2. Each broken components have clear exclusive responsiblities: the responsiblity of one component shouldnt interfre with a different component
  3. Each component you have figured out a good amount of technical detail for the following:
    1. database and caching
    2. scaling and fault tolerance
    3. async procssing/delegationy
    4. communication
  4. There is no component that is a single point of failure (SPOF) and there is a plan for recovery in case of failure

SPOF

Criticial user journeys shouldnt depend on an unrecoverable SPOF. And if we have to (cos theres always SPOFs somewhere in real life systems), it needs to be explicitly accepted and made known. Such as for a startup prototype, 1 postgres instance may be fine but for banks then youll need backups, failover, audit logs, alerting etc.

Business > Product > Architecture > Ops

Force every design to the steps above. Basically think first in terms of what are we solving, promise to the user, where do some technical issues/bottlenecks happen, and what can the business tolerate.

Like in the build a social network example above:

Business

You must think of the business goal. Is the main business goal retention and daily engagement? Or speed to launch? What sthe risks we have, marketing,etc.

Product

Architecture

Componetns, data models, APIs, and infra to make it work. Like think of the auth service, profile service, DBs, CDNs, queues, how to make notifications work

Ops

How to recover, deploy, etc.