Snapshot isolation in duckrun¶

How duckrun keeps a read-modify-write correct when more than one writer can touch the same Delta table, and how that maps onto Spark + Delta and a classic RDBMS.

The problem¶

A lakehouse has no transaction manager and no single-writer guarantee. Two pipelines, a double-fired job, or a notebook racing a scheduled run can all commit to the same table. The dangerous pattern is a read-modify-write:

1. read table X  (you see version V)
2. compute new rows from what you read
3. write X

If someone commits to X between step 1 and step 3, a naïve write at HEAD silently overwrites their change — a lost update, no error. duckrun's job is to turn that into a loud failure (CommitFailedError).

The guarantee¶

A read-modify-write on a single table behaves as if it ran under SNAPSHOT isolation: pinned to the version you read, and rejected if a conflicting change landed since.

It is a mental model, not a transaction manager — no BEGIN TRAN, no lock, no multi-table atomicity. The whole trick: pin the operation to the version you read, and let delta-rs's optimistic concurrency control (OCC) validate the commit against it.

Only the table being written is fenced. A table you merely read as an input is irrelevant — you didn't modify it, so there is no lost update to prevent. read_version is required on every fenced path; a brand-new table's first write is a plain create (write_delta), never fenced.

Where the references sit. SQL Server SNAPSHOT is the behavioral reference; Spark + Delta is the API and OCC reference. duckrun copies both surfaces, then — on the read-modify-write paths only — tightens the conflict window to the version you read, deliberately rejecting some races that Spark's commit-instant OCC would accept, so that no decision is ever committed on stale data. The fast, Spark-identical unfenced paths (plain append/overwrite) stay available; the extra strictness applies only when you opt in (a DeltaTable handle, a fenced writer mode, or the dbt path).

How it's enforced: two mechanisms¶

Both pin with DeltaTable.load_as_version(read_version). They differ in how strict the rejection is, and that difference is forced by the operation:

Operation	Mechanism	Fails when…
`merge`, `delete`, `update`	native OCC (`load_as_version` + the op)	a conflicting commit landed since `V` (same rows/files)
`append_if_unchanged`, `overwrite_if_unchanged`, `replaceWhere`	strict version CAS (`load_as_version` + `max_commit_retries=0`)	any commit landed since `V`

Why the asymmetry: delete/update/merge have a real read-set, so delta-rs detects genuine conflicts and rebases non-conflicting commits. An append/overwrite reads nothing from the target, so load_as_version alone is inert (no read-set to validate, delta-rs would just rebase onto HEAD) — max_commit_retries=0 is the only thing that makes it fail-loud. So a delete tolerates an unrelated concurrent append, while a fenced append fails on any movement. This is also exactly how SQL Server SNAPSHOT behaves: abort on a write-write conflict, not on every concurrent commit.

The `DeltaTable` handle is the snapshot scope¶

For the connection API the handle is the poor-man's BEGIN TRAN:

dt = DeltaTable.forName(conn, "orders")   # captures version V here
# ... time passes, another writer may commit ...
dt.delete("status = 'cancelled'")          # pinned to V; fails loud if orders moved since V

forName / forPath capture the version once, when the handle is taken.
merge / delete / update through that handle all pin to that captured V.
After a successful mutation the handle re-snapshots to the new HEAD, so a second mutation on the same handle only fails on a foreign write, never on its own previous write.

append/overwrite are unfenced by design (below) and are not part of the handle.

The dbt path¶

The dbt incremental materialization does this automatically every run: it captures the target version vB before the model runs, pins the model's {{ this }} read to it (delta_scan('…', version => vB) — the reason for the duckdb 1.5.4 floor), and pins the merge commit to vB. Read and write agree on one snapshot. Version-by-version proof through a real dbt run is in snapshot-pin.md.

Fenced vs unfenced writer modes¶

mode("append") and mode("overwrite") are unfenced by design, matching Spark's SaveMode: append never fails on a concurrent write; overwrite is last-writer-wins. Keeping them unfenced preserves the fast, high-concurrency append path. Each has a fenced compare-and-swap sibling:

Unfenced (Spark `SaveMode`)	Fenced sibling	Fails if the table moved since the read version
`mode("append")`	`mode("append_if_unchanged")`	yes (any movement)
`mode("overwrite")`	`mode("overwrite_if_unchanged")`	yes (any movement)

safeappend is a deprecated alias for append_if_unchanged (the dbt strategy accepts both).

Lazy reads. DuckDB relations are lazy: conn.table("x") / conn.sql(...) read nothing — the relation runs only when .write does. So the read, the dedup, and the commit all happen inside the one write call, and the fenced modes capture read_version at write time; the CAS fences that whole window. duckrun cannot pin an append's read to a version (delta-rs has no such API) and doesn't need to — the guarantee lives entirely in the commit. The only time read and write sit at genuinely different versions is when you materialize and act later; that is what the handle is for (it carries V explicitly).

vs Spark + Delta¶

duckrun deliberately mirrors the Delta-on-Spark surface — DeltaTable.forName/forPath, .merge(...).whenMatched*/whenNotMatched*, delete, update, SaveMode append/overwrite, replaceWhere — and runs on the same OCC model (delta-rs is the Rust port of Delta's commit protocol), so semantics carry over: append is non-conflicting, delete/update/merge are conflict-checked, SaveMode.Append/Overwrite are unfenced.

The one real difference is where the read version comes from:

	Spark + Delta	duckrun
`DeltaTable.merge/delete/update`	reads HEAD at execution; OCC checks only the commit instant	pinned to the version captured at handle construction (`vB`), so a read-modify-write split across statements is fenced
Fenced append (watermark insert)	no built-in; you write a `MERGE`	`append_if_unchanged` — a lighter version CAS
`replaceWhere`	overwrite option	single atomic commit + version CAS (`replace_where`)
`SaveMode.Append` / `Overwrite`	unfenced	unfenced (identical)
Multi-table transaction	❌	❌

In short: Spark/Delta fence the commit instant and leave the read-to-write gap to MERGE or job-level retry. duckrun makes the read version the thing you fence to — captured at the handle or at vB — and adds the lighter safeappend for the watermark case Spark has no built-in for.

vs an RDBMS, and what duckrun does not provide¶

System	Mechanism	Scope	Multi-table txn
duckrun	OCC on the Delta version log; explicit handle / per-op pin	single table	❌
delta-rs / Spark + Delta	OCC on the `_delta_log`	single table	❌
SQL Server	transactions + isolation levels (lock or MVCC)	multi-statement, multi-table	✅
Postgres	MVCC; `SERIALIZABLE` (SSI)	multi-statement, multi-table	✅

An RDBMS gets all of this "for free" because it owns its storage and runs a transaction manager — BEGIN TRAN … COMMIT under SNAPSHOT spans statements and tables. duckrun owns none of that; it is a library over a shared lakehouse. So the handle is the stand-in for BEGIN TRAN, deliberately single-table. What a lakehouse can't honestly provide, and duckrun therefore doesn't:

Multi-table transactions — Delta commits one table at a time.
Pessimistic locking / blocking — lakehouses are optimistic-only; writers fail-and-retry, never block.
Isolation across a materialized read — DuckDB relations are lazy (exactly like Spark DataFrames), so a conn.sql read that feeds a fenced .write is pinned at write time even when the read and the write are separate calls — read and commit land on one snapshot, and the CAS fences the window. The gap opens only if you materialize (.toPandas() / collect()) and write the result back later, so the read and the write genuinely sit at different versions. That is what the DeltaTable handle is for — it carries the read version explicitly. Spark + Delta behaves identically: a lazy DataFrame read→write is fenced at the commit; a collected-then-written result is not, and you reach for a MERGE or an explicit version check.