DuckDB snapshot pin โ what it fixes¶
On every incremental run, duckrun captures the target table's Delta version vB before the
model runs, then anchors both sides of the run to it:
- Read pin โ the model's
{{ this }}self-reference is registered asdelta_scan('<location>', version => vB)(the duckdb-deltaversion => Nparam, which is why duckdb is pinned to the 1.5.4 floor), so the model SQL sees the table as ofvB, not a drifted HEAD. - Write pin โ the merge target is opened at the same
vB, and delta-rs validates the commit over the window(vB, HEAD]. A foreign commit inside that window makes the merge refuse.
Together this is single-snapshot MERGE semantics: the read and the write agree on one version of the table. Lakehouses don't guarantee a single writer (a job double-fires, two pipelines touch the same table), so this matters in practice.
The snapshot-pin job in cores.yml proves it through a real
dbt run: it runs the same concurrent-writer race twice against an incremental MERGE model โ
once without the pin (the old behaviour: read/commit against HEAD) and once with it โ and walks
the table's Delta versions so you can see, version by version, where the unpinned path silently
loses data and the pinned path catches it. The full scenario is in
tests/integration_tests/snapshot_pin/; the assertions
that gate every change live in test_snapshot_pin.py. The latest scorecard is rendered live below.
๐ DuckDB snapshot pin โ version by version โ ¶
What this proves (through a real dbt run): the same concurrent-writer race is run twice against an incremental MERGE model โ the only difference is whether the merge is pinned to vB (the version captured at the start of the run) or left to read/commit against HEAD. The table's Delta versions tell the story: a writer commits id=1 = 999 (v1) while the model is mid-run.
WITHOUT the pin (merge vs HEAD โ old behaviour) โ merge anchored to v1 (HEAD)¶
| Delta version | rows (idโvalue) |
what produced it |
|---|---|---|
v0 |
1โ10, 2โ20, 3โ30, 4โ40, 5โ50, 6โ60, 7โ70, 8โ80, 9โ90, 10โ100 | dbt run #1 โ seed (ids 1..10) |
v1 |
1โ999, 2โ20, 3โ30, 4โ40, 5โ50, 6โ60, 7โ70, 8โ80, 9โ90, 10โ100 | concurrent writer: update id=1 -> 999 โ merge anchored HERE (HEAD) โ NO pin |
v2 |
1โ111, 2โ20, 3โ30, 4โ40, 5โ50, 6โ60, 7โ70, 8โ80, 9โ90, 10โ100 | dbt run #2 โ incremental MERGE (batch: update id=1 -> 111) |
Final state: 1โ111, 2โ20, 3โ30, 4โ40, 5โ50, 6โ60, 7โ70, 8โ80, 9โ90, 10โ100 โ โ WRONG โ merge committed v2 against HEAD; id=1 = 999 silently LOST (no error raised)
WITH the pin (duckrun today) โ merge pinned to vB = v0¶
| Delta version | rows (idโvalue) |
what produced it |
|---|---|---|
v0 |
1โ10, 2โ20, 3โ30, 4โ40, 5โ50, 6โ60, 7โ70, 8โ80, 9โ90, 10โ100 | dbt run #1 โ seed (ids 1..10) โ merge pinned HERE (vB) โ read + commit |
v1 |
1โ999, 2โ20, 3โ30, 4โ40, 5โ50, 6โ60, 7โ70, 8โ80, 9โ90, 10โ100 | concurrent writer: update id=1 -> 999 |
v2 |
โ never written | MERGE refused โ pinned to v0; HEAD is now v1, and the check (v0, v1] catches the concurrent v1 |
Final state: 1โ999, 2โ20, 3โ30, 4โ40, 5โ50, 6โ60, 7โ70, 8โ80, 9โ90, 10โ100 โ โ
RIGHT โ merge REFUSED โ no v2 written; run failed loudly; id=1 = 999 preserved
๐ด Final tables โ same code, same race, 1 row differs¶
| id | WITH pin | WITHOUT pin | difference |
|---|---|---|---|
1 |
999 | 111 | โฌ ๏ธ DIFFERENT |
2 |
20 | 20 | same |
3 |
30 | 30 | same |
4 |
40 | 40 | same |
5 |
50 | 50 | same |
6 |
60 | 60 | same |
7 |
70 | 70 | same |
8 |
80 | 80 | same |
9 |
90 | 90 | same |
10 |
100 | 100 | same |
Read it like this (here
vB = v0, and HEAD =v1, the concurrent writer's commit): without the pin,v2overwroteid=1 = 999with111โ the concurrent writer's change vanished and the run reported success. With the pin, the merge checks the range after its pinned version up to the current HEAD โ(v0, v1]โ sees the concurrentv1there, and refused: nov2, the run fails loudly, andid=1 = 999is still there. Re-run and it merges cleanly on top ofv1.โ The pin fixes a real correctness bug: silent data loss becomes a safe, loud failure.