PersistX
Abstract
Abstract
Aim
To design and implement PersistX — a production-grade, ACID-compliant storage engine in modern C++17 that replicates the core internals of real-world database systems such as PostgreSQL and MySQL/InnoDB. The project aims to demystify how databases achieve durability, crash recovery, and efficient indexed data access by building each layer — from raw disk I/O and buffer management to write-ahead logging, ARIES-based recovery, and Red-Black Tree indexing — from the ground up through a strict, layered architecture with well-defined interface boundaries. Beyond being a functional transactional key-value store, PersistX also serves as an interactive educational tool, enabling users to explore, inspect, and experiment with database internals in real time through a rich command-line shell.
GMeet Link : https://meet.google.com/bpa-xgzp-kpi
Introduction
Most developers use databases as black boxes. This project opens that box. It answers questions that most database courses skip: where does a record actually live? Why is sequential I/O faster than random I/O? What happens when a program crashes mid-write? How can a search over a million records take just twenty comparisons instead of a million?
PersistX is a storage engine built entirely from scratch in modern C++17, modelled after the internal architecture of PostgreSQL, MySQL/InnoDB, and SQLite. It implements a strict, layered design — from slotted page storage and LRU-K buffer management, through write-ahead logging and ARIES crash recovery, up to B+ Tree indexing — where each component has one clear responsibility and communicates through well-defined interfaces. An interactive REPL shell ties the layers together, allowing users to issue transactions, simulate crashes, and inspect every level of the engine in real time.
Literature Survey and Technologies Used
Literature Survey
1. Page-Oriented Storage — Modern databases store all data in fixed-size pages (typically 4–8 KB), aligned to disk block boundaries. The slotted page layout, described in Ramakrishnan & Gehrke's Database Management Systems (2003), uses a slot directory and variable-length records to provide stable record identifiers that survive compaction — a design adopted by PersistX.
2. Buffer Pool Management (LRU-K) — O'Neil, O'Neil, and Weikum proposed the LRU-K replacement policy (SIGMOD 1993), which tracks the K-th most recent access to each page rather than just the latest. This avoids the "sequential flooding" problem of classic LRU, where a single table scan evicts hot pages. PersistX uses LRU-K with K=2 for its buffer pool.
3. Write-Ahead Logging — The WAL protocol guarantees durability by ensuring that log records reach stable storage before the corresponding data pages. Gray & Reuter's Transaction Processing (1993) formalises the steal/no-force policy — used by PostgreSQL, InnoDB, and PersistX — which maximises runtime performance while relying on log replay for crash recovery.
4. ARIES Recovery — Mohan et al. introduced ARIES (ACM TODS, 1992), a three-phase recovery protocol (Analysis → Redo → Undo) that is the industry standard for crash recovery. Its use of Compensation Log Records (CLRs) guarantees idempotent recovery even if the system crashes during recovery itself. PersistX implements the full ARIES protocol.
5. Educational Database Systems — The approach of building database internals from scratch is championed by CMU's 15-445 course (Pavlo, 2015–present) using the BusTub framework. PersistX follows this pedagogical tradition, implementing every layer in a single codebase without external database libraries.
Technogolies Used
| Technology | Version | Uses |
| C++17 | ISO/IEC 14882:2017 | Core implementation language; used for std::optional, std::string_view, structured bindings, if constexpr, and inline variables |
| CMake | >=3.20 | Cross-platform build system; manages compilation targets, dependency linking, and test executables |
| POSIX/WIN32 File I/O | ------ | Low-level page-aligned reads and writes to the database file (persistx.db) via std::fstream |
| MinGW-w64/GCC | >= 10.x | Compiler toolchain on Windows (also compatible with GCC/Clang on Linux) |
| std::filesystem | ------ | Directory creation, file iteration, path operations in DiskManager and WALManager. |
| Git | >= 2.x | Version control and source code management |
Methodology
The project follows a strict bottom-up layered architecture where each component has a single responsibility and knows nothing about the layers above it.
Phase 1 — Slotted Page Layout. The fundamental storage unit is a 4096-byte page with a 21-byte header, a forward-growing slot directory (6 bytes per slot), and backward-growing variable-length records. Deleted records are marked as tombstones to preserve RID = (page_id, slot_id) stability. All header access uses memcpy to avoid strict-aliasing undefined behaviour.
Phase 2 — Disk Manager. A single binary file (persistx.db) stores all pages at byte offset page_id × 4096, enabling O(1) random access. A mutex-protected allocate_page() provides monotonic page ID assignment. The Disk Manager transfers raw 4096-byte buffers without knowledge of page internals.
Phase 3 — Buffer Manager. An in-memory pool of page frames caches hot pages using an LRU replacement policy. Pin counts prevent eviction of pages in active use. Only dirty pages incur disk writes on eviction — clean pages are discarded for free.
Phase 4 — Write-Ahead Log Manager. Every modification is first serialised as a LogRecord (supporting eight types: BEGIN, UPDATE, COMMIT, ABORT, TXN_END, CLR, CHECKPOINT_BEGIN, CHECKPOINT_END) and flushed to a log file before the data page reaches disk. UPDATE records carry full before- and after-images, enabling both redo and undo. The Buffer Manager enforces this WAL rule on every dirty page flush.
Phase 5 — Transaction Manager. Provides ACID semantics via begin(), commit(), and abort(). Commit force-flushes the log for durability. Abort walks the undo chain via prev_lsn pointers, restoring before-images and writing CLR (Compensation Log Records) that make rollback crash-safe.
Phase 6 — Checkpoint Manager. Performs fuzzy checkpointing by snapshotting the Active Transaction Table and Dirty Page Table into a CHECKPOINT_END log record, bounding recovery time to log-since-last-checkpoint rather than the full log.
Phase 7 — ARIES Recovery Manager. Implements three-phase crash recovery: Analysis reconstructs the dirty page and active transaction tables from the last checkpoint; Redo replays all logged actions to restore the exact pre-crash state; Undo rolls back incomplete transactions using CLRs for idempotent re-recovery.
Phase 8 — B+ Tree Index. Disk-backed B+ Tree providing O(log N) point lookups and range scans. Leaf nodes hold up to 291 sorted (key, RID) entries linked via next-leaf pointers. Internal nodes hold up to 340 keys with child page pointers. Insertions trigger recursive node splits; deletions remove entries without merging (matching PostgreSQL's policy).
Phase 9 — Interactive Shell. A REPL wires the full stack together, exposing data operations (put, get, del, scan), transaction control (begin, commit, abort), system inspection (status, tree, buffer, wal), and crash simulation (crash, recover) — all rendered with ANSI colour and UTF-8 box-drawing for a polished terminal experience.

Results
All layers of the PersistX storage engine are implemented from scratch and verified through a comprehensive test suite of 12 sections spanning unit tests, integration tests, and stress tests — all passing.
|
Operation |
Implementation |
Complexity |
|
get(key) |
B+ Tree traversal (root → leaf) |
O(log₃₄₀ N) — 2–3 page reads for millions of records |
|
put(key, val) |
B+ Tree insert + data page write + WAL |
O(log₃₄₀ N) |
|
del(key) |
B+ Tree remove + tombstone marking + WAL |
O(log₃₄₀ N) |
|
scan(lo, hi) |
Leaf-level linked list traversal |
O(log₃₄₀ N + K) where K = result count |
|
Buffer cache hit |
Direct pointer to in-memory page frame |
O(1) |
|
LRU eviction |
Linked list removal via hashmap iterator |
O(1) |
|
WAL append |
Serialise record + memcpy to buffer |
O(1) amortised |
|
Transaction commit |
Force-flush log up to commit LSN |
O(log records) |
B+ Tree Indexing
With a fanout of 340 internal keys and 291 entries per leaf, a database of 1,000,000 records requires at most ⌈log₃₄₀(1,000,000)⌉ = 3 page reads for any point lookup. The test suite verifies correctness at scale: 10,000 sequential keys, 5,000 random-order keys, and mixed insert/delete/search workloads under load all pass. Range scans across split boundaries (3,000-key range scan across multiple leaves) return results in correct sorted order. Edge cases including INT64_MIN/INT64_MAX keys, duplicate rejection after splits, full delete-and-reinsert cycles, and index persistence across flush + reopen are all verified.

Buffer Pool Management
Data survives eviction and re-fetch: with a pool of just 2 frames, writing a sentinel value (0xDEADBEEF), forcing eviction via two more page allocations, and re-fetching confirms the data was flushed to disk and reloaded intact. Pinned pages are protected from eviction — the test confirms that when one frame is pinned and the pool is full, the eviction victim is always an unpinned frame.

Write-Ahead Logging and Transactions
Every data modification is logged with full before- and after-images before the data page reaches disk. The commit command force-flushes the COMMIT record to stable storage before returning, guaranteeing durability. The abort command walks the undo chain via prev_lsn pointers, restores every before-image, and writes CLR (Compensation Log Records) for each step — making rollback crash-safe. Abort also removes ghost B+ Tree entries for insert operations that were rolled back, preventing phantom keys from surviving in the index.

ARIES Crash Recovery
The three-phase ARIES protocol is verified through deliberate crash simulation using the shell's crash command (which calls _Exit(1) — immediate process termination without flushing buffers or running destructors). On restart, the engine automatically detects the WAL and runs recovery:
Analysis: reconstructs the Dirty Page Table and Active Transaction Table from the last checkpoint.
Redo: replays all logged modifications whose page LSN is behind the log record LSN, restoring the exact pre-crash physical state. Handles insert redo (via redo_insert at the exact slot), delete redo, and variable-length update redo (with fallback to delete + force-insert when sizes differ).
Undo: rolls back all transactions that were active at crash time. CLR undo_next_lsn pointers guarantee idempotent re-recovery even if the system crashes during undo itself.
After ARIES completes, the B+ Tree index is automatically rebuilt by scanning all recovered data pages, extracting keys from records, and reinserting them into a fresh tree.

Checkpointing
Fuzzy checkpoints serialise snapshots of the ATT and DPT into a CHECKPOINT_END log record. Recovery's Analysis phase starts from the last checkpoint rather than the beginning of the log, reducing recovery time from O(total WAL size) to O(WAL since last checkpoint).
Conclusion
PersistX successfully demonstrates that the internals of a production database engine can be built from scratch in modern C++17 with clear, layered abstractions. The project implements every core subsystem found in real-world databases like PostgreSQL and InnoDB: slotted page storage with stable RIDs, an LRU buffer pool, write-ahead logging with full before/after images, ACID transaction management with crash-safe abort via CLRs, fuzzy checkpointing, three-phase ARIES crash recovery, and a B+ Tree index delivering O(log N) lookups and range scans. A 77-test suite — including deliberate crash simulation via _Exit followed by automatic ARIES recovery — confirms that committed data is preserved, incomplete transactions are rolled back, and index consistency is maintained after arbitrary failures. The interactive shell further serves the project's educational goal by making every layer inspectable and experimentable in real time.
Future Scope
Concurrency Control. Implementing two-phase locking (2PL) or multi-version concurrency control (MVCC) to allow multiple concurrent transactions with full isolation guarantees.
SQL Query Layer. Adding a SQL parser, table catalog, and query planner on top of the existing B+ Tree and heap page infrastructure to transform PersistX from a key-value store into a relational database.
Secondary and Hash Indexes. Supporting non-unique secondary indexes and alternative structures like extendible hashing to enable richer query patterns and demonstrate access-method trade-offs.
Network Layer. Adding a TCP server with a wire protocol to allow remote client connections, turning the engine into a standalone database server.
Page Compression. Implementing transparent compression (e.g., LZ4) within the Disk Manager to reduce I/O — demonstrating the layered architecture's strength where changes in one layer are invisible to others.
WAL Replication. Shipping WAL records to a standby instance for basic high-availability replication and point-in-time recovery.
References
[1] C. Mohan, D. Haderle, B. Lindsay, H. Pirahesh, and P. Schwarz, "ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging," ACM Transactions on Database Systems, vol. 17, no. 1, pp. 94–162, 1992.
[2] R. Bayer and E. McCreight, "Organization and Maintenance of Large Ordered Indexes," Acta Informatica, vol. 1, no. 3, pp. 173–189, 1972.
[3] D. Comer, "The Ubiquitous B-Tree," ACM Computing Surveys, vol. 11, no. 2, pp. 121–137, 1979.
[4] E. J. O'Neil, P. E. O'Neil, and G. Weikum, "The LRU-K Page Replacement Algorithm for Database Disk Buffering," Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 297–306, 1993.
[5] J. Gray and A. Reuter, Transaction Processing: Concepts and Techniques. Morgan Kaufmann, 1993.
[6] R. Ramakrishnan and J. Gehrke, Database Management Systems, 3rd ed. McGraw-Hill, 2003.
[7] A. Silberschatz, H. F. Korth, and S. Sudarshan, Database System Concepts, 7th ed. McGraw-Hill, 2019.
[8] A. Pavlo, "CMU 15-445/645: Database Systems," Carnegie Mellon University, 2024.
[9] ISO/IEC 14882:2017, "Programming Languages — C++," International Organization for Standardization, 2017.
[10] PostgreSQL Global Development Group, "PostgreSQL Documentation — WAL Internals," 2024.
Mentors' GitHub Repo Link : PersistX
Mentees' GitHub Repo Link : PersistXLite
Mentees Details :
Moulik Bansal 251IT038
Ekansh Goel 251IT022
Krivit Sisodiya 251IT032
Shourya Gupta 251IT067
Yogesh 251CS169
Vyom
Report Information
Team Members
Team Members
Report Details
Created: May 17, 2026, 8:19 p.m.
Approved by: None
Approval date: None
Report Details
Created: May 17, 2026, 8:19 p.m.
Approved by: None
Approval date: None