Postgres: Inserts vs Updates
The general notion of thought is that an update in postgres is slower than an insert and hence, inserts are preferred over updates. But there is a slight misconception to this notion and lets address the difference between inserts and updates.
Writes process (inserts and updates)
The following tasks are executed when a new write has to be performed (inserts and updates, an update in postgres is also an insert into the WAL and DB due to concurrency control - MVCC):
- Data is written to WAL
- WAL is flushed to disk (with checkpointing)
- WAL data is written to data pages on disk
At this stage, there’s no significant difference in performance between inserts and updates since updates are essentially inserts into the WAL.
However, updates often generate more WAL because they involve marking the old row as dead (due to concurrency control - MVCC) and inserting a new version. This can lead to more WAL bloat compared to inserts.
Read Complexity
- Inserts: An index scan and fetching a single row from disk (done by loading the page the row is stored in).
- Updates: Since PostgreSQL follows MVCC (Multi-Version Concurrency Control), an update doesn’t overwrite the row but creates a new version. The old version is still present until VACUUM cleans it up, which means a SELECT might have to scan multiple versions to find the correct one. This is what makes updates slower.
- Vacuum is a process in Postgres that cleans up older version of rows. It is performed automatically and hence also called
AutoVacuum
process. - If
autovacuum
hasn’t cleaned up old versions, a query on an updated row may need to filter through multiple versions, increasing CPU and I/O load. - A similar problem of
count(*)
is also due to high number of updates. More details mentioned in concurrency control - MVCC.
- Vacuum is a process in Postgres that cleans up older version of rows. It is performed automatically and hence also called
Conclusion
From the above, its clear that the update statement in itself isn’t slow but reading an updated row can be slower compared to reading a new unique entry into the database.