Transactions in microservices

There are 2 approaches for handling transactions in microservices:

  • 2 phase commit (2PC)
  • Saga

2PC

2 phases:

  • Prepare - Reserve (lock resources).
  • Commit - If all reservations are successful, commit the transaction.

The coordinator node is responsible for preparing and committing.

The prepare phase has a timeout for each resource, usually a large timeout. For this time, the resource is locked. If the commit or rollback doesn’t happen within the timeout, the resource is unlocked.

timeout might not work as initially the services promised to commit the transaction. Usually called an anti-availability transaction as it requires all services to be up and running.

Google spanner solves this by having each service in a paxo group. This way, the service is still available if one or more nodes die.

Synchronous & Strong consistency.

Saga

saga is a sequence of local transactions that updates each service and publishes another message to trigger another local transaction on the next service.

Choreography-based and orchestrator-based sagas are the most popular patterns for inter-service communication to have consistent data.

If a step fails, saga emits a compensating event rather than rollback. Example: Payment refunds, etc.

Asynchronous & eventual consistency, uses compensating transactions.

Orchestrated Saga

Managed by a central saga orchestrater. Orchestrater emits events to all services. Easier to implement but has a central point of failure.

Choreography-based Saga

Each service communicates with each other based on event emitted. Events are emitted by each service and other services take actions based on if its a failed or passed event.

Can scale easily, as there is no central bottleneck like Orchestrated saga but is more complicated.

Communication between services can either happen through the following ways:

  1. Command channels: Publisher directly sends a message to the next service once it completes the operating and commits the local transaction. The drawback is that the publisher needs to know the location of the next service.
  2. Pub/Sub mechanism: Publisher publishes an event and all interested consumers will pickup the event and act on it (either to commit their local transaction or compensate). The disadvantage is that there is a single point of failure, i.e, the message broker.