databases

Postgres Architecture

  1. Post master process/Listener: Listens on a port
  2. Backend Processes
  3. Shared memory
  4. Background workers
  5. Auxiliary processes
  6. Auto vacuum workers

Postgres is an RDBMS, based on multi-version concurrency model using an append model. Everything is a process compared to using threads.

Post master

Process that listens for connections. It is the master process for all processes spawned.

Every connection results in a new backend process backed by max_connections prop

Shared memory

Allocated using a linux command called mmap. This contains all the pages/indexes/etc from the disk loaded into memory for all other processes to access.

Background workers

Backend process can offload the work to background workers to execute queries in case of parallel plan.

Auxiliary Processes

  • Background writer process
    • Writes WAL files TO OS (to pages).
    • OS will not immediately write to disk, it writes to its own file system cache.
  • Checkpointer process
    • Flushes page data and WAL from OS to DISK and creates a check point record, with timestamp, which implies that at this timestamp, data is now consistent.
  • Logger process
  • Auto vacuum workers/launcher
    • Free up tuples for same rows which might be outdated (since every change is a write, resulting in a tuple)
  • WAL archiver
    • WAL changes are compressed and stored. Use for replication.
  • WAL Receiver
    • Present in replicas to receive WAL archives for replication.
  • WAL writer
    • This flushes the WAL files to DISK.
  • WAL senders
    • Send data to replicas
  • Starter process
    • First process to run in postgres before post master.
    • Post a checkpoint, there can still be new persisted WAL files, the starter process will go the last checkpoint and then see that there are non-checkpointed WAL files, hence goes through the WAL files and persists them to pages. This is critical in terms of crashes. The checkpointer is critical to tell us at what point the data is consistent and after this point, recovery is required.
    • This process runs first, to ensure database consistency and then starts listening for connections.

Stuff

  • Copy on write (COW)