Database disk based arch

databases primary storage: Disk

Movement of data between non-volatile and volatile storage for operations/modifications.

Allow the DBMS to manage databases that exceed the amount of memory available.
Reading/writing to disk is expensive, so it must be managed carefully to avoid large stalls and performance degradation.
Random access on disk is usually much slower than sequential access, so the DBMS will want to maximize sequential access.

Storage

On disk, database is a bunch of files. Files have metadata. Files are broken down into pages. Pages are like blocks, way to divide the data in the file at fixed offsets.

Buffer pool is memory managed by the DBMS, where pages from disk is loaded into for operations.

Storage manager

Responsible for maintaining a database’s files. It organizes the files as a collection of pages.

tracks data read/written to pages
tracks the available space

they can schedule disk reading or OS can do it using fread()

they are responsible to reclaim space when data is deleted, etc.

page is a fixed size block of data. A file will mostly have fixed page sizes. It helps breaking up a file into fixed offsets. Pages have metadata such as timestamp, checksums, indexes, log records, etc.

Contents of a page will be uniform, they will contain tuples of same type, ex: a page won’t have tuples from index and table, they will be homogenous.

Page is self contained, meaning that a page must contain information stating what the page itself contains. This is mostly for disaster recovery, incase of disaster recovery, the DBMS knows what the page contains as the page is self contained, there is no need for a metadata store stored elsewhere because if we lose the metadata store, then we have no idea how to read the page.

It contains:

Headers
Tuple pointers
Free space maps
Visibility metadata
Possibly leftover deleted tuples, fragmentation, etc.

Notes ✏️

Explorer

Database disk based arch

Storage

Storage manager

Table of Contents