databases

Partitioning and Sharding

Scaling a database

On large tables, usually an index is used to speed up a read query. Sometimes, a full index scan can also be slow as the index grows. This will increase the read latency. There are multiple ways to tackle this before we get to sharding.

We are trying to solve read latency. Writes are still alright, since we are writing to the end of the table but still rebalancing the index might be slow.

Usually, using LSM tree for write heavy database is more efficient. LSM trees are optimised for writes.

  1. Horizontal partitioning (single server)
    1. We split the single table into multiple parts called partitions.
    2. This in result in multiple indexes for these partitions. Since these partitions are smaller, the index is smaller, where the full index scan is much faster than the entire table’s index.
    3. We have smaller tables, client can let us know which partition to hit, smaller index fast queries.
  2. Adding read replicas
    1. This is required if a single database node cannot handle so many connections made to it.
    2. In order to distribute the connections, we use read replication, where connections can be distributed/load balanced.
    3. Read replica will take on read requests only and sync with master server which accepts writes and critical reads (critical reads are those which require strong consistency.
    4. Master server will send updates to the replicas or backup servers.
  3. Multi master replication
    1. Database cluster contains multiple nodes with each node containing 100% of the data.
    2. Multiple masters accept writes and conflicts are resolved.
    3. This is commonly used, by deploying masters on different regions to cater to lower latency
  4. Sharding
    1. It is similar to horizontal partitioning but each partition lives in a different database node and not on the same server.
    2. Each shard (server) owns some amount of data. Data is partitioned across the shards. Each partition is mutually exclusive (sub-sets of complete data).

Resources

https://youtu.be/iHNovZUZM3A?si=lIsWovh8yrw1mRTp