A Theoretical Study of DBLog: Certified Virtual Cuts for a Snapshot-Equivalent Replay of Live Databases

Abstract

This paper is a theoretical follow-up to the 2020 DBLog paper, which described a change-data-capture (CDC) mechanism for backfilling a downstream system from a live source database and streaming its ongoing changes, while the source keeps accepting writes. DBLog reads the table in primary-key range scans (chunks) interleaved with the source change log: watermarks locate each chunk in log order, chunk rows become refresh events, and CDC events repair stale chunk observations. The mechanism requires no table lock, no pause in writes, and no global read transaction, and is now adopted by Debezium and Apache Flink CDC. The 2020 paper described the mechanism operationally but did not formalize its correctness object. This paper formalizes that object: DBLog constructs a snapshot-equivalent replay certificate without requiring a single physical snapshot read. The central formal object is a certified virtual cut: a finite evidence bundle whose certified replay reaches the same per-key state as the source at a chosen frontier on a chosen key scope. A virtual cut is extensional: replay equality at a frontier, not a physical snapshot read, and asserts no single source timestamp across chunk rows. The paper proves per-key replay equality for every wellformed DBLog run at its frontier and scope, and that an accepted certificate, evaluated against faithful source observation, witnesses such a run and yields a virtual cut. It also proves two source-side algebra facts: on the same scope, a cut advances to later frontiers by appending the scope-filtered faithful CDC segment committed in between, and a cut restricts to sub-scopes. Each result is Conditional on premises stated where it appears. Whole-table correctness, exactly-once delivery, sink-state convergence, and transfer to named deployments are not consequences of certificate acceptance alone. All proofs are machine-checked in Isabelle/HOL.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…