Connection API (notebook)¶
Besides the dbt adapter, duckrun ships a storage-neutral, DataFrame-style duckrun.connect() for
interactive/notebook use (local, S3, GCS, ADLS, OneLake):
conn.sql(...)— DuckDB SQL over the discovered Delta tables, including time travel (delta_scan('…', version => N)). Reads pass straight through; raw DML (create table … as,insert,update,delete,alter add column,drop,merge) is applied to the Delta table via delta_rs (works local AND on OneLake) — see the DML matrix below.- a
DataFramewith a DataFrame-style.write…saveAsTable()— modesoverwrite/append/append_if_unchanged/overwrite_if_unchanged/ignore(the_if_unchangedmodes are the fenced, fail-loud siblings;safeappendis the deprecated alias forappend_if_unchanged), plusoption("replaceWhere", …)for an atomic slice overwrite — plusconn.readandconn.catalog. - a
DeltaTablehandle (DeltaTable.forName(conn, name)) mirroring theDeltaTableAPI:.merge(...),.delete(),.update(),.version(),.history(). - multiple catalogs:
connect()binds one lakehouse root (the primary catalog); attach more withconn.attach(path, name=…)and read/join across them by three-partcatalog.schema.tablename — see Multiple catalogs withconn.attachbelow.
connect() is read-only by default: every Delta write (saveAsTable / insertInto / save /
merge / insert / update / delete / replaceWhere) raises PermissionError, so an accidental
write can't mutate a shared lakehouse. Pass read_only=False to enable writes; reads and native
CREATE TEMP/CREATE VIEW scratch are always allowed.
For a method-by-method map of this surface against PySpark / Delta-on-Spark — what maps 1:1, what's duckrun-flavored, and what's deliberately out of scope (SQL-first, no Spark runtime — by design) — see Coverage vs the Spark / Delta API.
merge is snapshot-pinned by default — single-snapshot MERGE, with no extra arguments:
the target version is captured and the commit validates against it, so a concurrent writer fails the
commit loudly instead of silently interleaving. mode("append_if_unchanged") (alias: safeappend)
and mode("overwrite_if_unchanged") apply the same fail-loud compare-and-swap to a plain append /
full overwrite: they commit only if the table is unchanged since the version read, else raise
CommitFailedError. The full model and a cross-engine comparison are in
Snapshot isolation.
import duckrun
# writable session — read_only=False opts in (the default is read-only)
conn = duckrun.connect("abfss://ws@onelake.dfs.fabric.microsoft.com/lh.Lakehouse/Tables/dbo",
read_only=False)
conn.sql("select * from orders").write.mode("overwrite").saveAsTable("orders_copy")
conn.table("orders_copy").show()
from duckrun import DeltaTable
DeltaTable.forName(conn, "orders").delete("region = 'eu'") # delete / update / version
# overwrite just one slice, atomically
conn.sql("select * from corrections").write.option("replaceWhere", "region = 'eu'") \
.mode("overwrite").saveAsTable("orders")
src = conn.sql("select * from updates")
DeltaTable.forName(conn, "orders").merge(src, "target.id = source.id") \
.whenMatchedUpdateAll().whenNotMatchedInsertAll().execute() # pinned automatically
In-memory data with conn.createDataFrame¶
conn.createDataFrame(data, schema=None) turns in-memory data into a DataFrame on duckrun's own
connection — handy for seeding, demos, or persisting a small Python/pandas result to Delta. data
is a list of tuples/lists (a list of scalars becomes one column), a pandas DataFrame, or a
pyarrow Table/RecordBatchReader. schema is None (names inferred — _1, _2, … for tuples),
a list of column names, or a DDL string ("id int, name string"; name: type is also accepted).
Empty data needs a DDL schema. It's a plain DuckDB build — no Spark/PySpark dependency.
conn.createDataFrame([(1, "a"), (2, "b")], "id int, name string") \
.write.mode("overwrite").saveAsTable("seeded")
conn.createDataFrame(pandas_df).show()
conn.createDataFrame([], "id int, name string") # typed empty frame
Multiple catalogs with conn.attach¶
connect() binds one lakehouse root as the primary catalog; conn.attach(path, name=…) binds
more, so a single session can read and join across several lakehouses by three-part name
(catalog.schema.table). In Microsoft Fabric a Warehouse and a Lakehouse are the same thing
to duckrun — both are Delta in OneLake — except a Warehouse is locked to writes, so attach it
read_only=True and keep the lakehouse writable. On OneLake tokens are automatic (nothing to pass in
a Fabric notebook).
import duckrun
# primary catalog: a writable lakehouse. name= reads next to the others (else derived from the URL).
conn = duckrun.connect(
"abfss://ws@onelake.dfs.fabric.microsoft.com/lakehouse.Lakehouse/Tables",
read_only=False, name="lakehouse")
# attach a read-only Fabric Warehouse and a plain local folder as more catalogs
conn.attach("abfss://ws@onelake.dfs.fabric.microsoft.com/warehouse.Warehouse/Tables",
name="warehouse", schema="mart", read_only=True)
conn.attach("/data/reference", name="local")
conn.catalog.listCatalogs() # ['lakehouse', 'warehouse', 'local']
# join across all three in one query — facts from the warehouse, dim from the lakehouse,
# a lookup from local — and write the mart back to the (writable) lakehouse
conn.sql("""
SELECT d.state, sum(f.mw) AS total_mw
FROM warehouse.mart.fct_summary f
JOIN mart.dim_duid d ON d.duid = f.duid
LEFT JOIN local.dbo.fuel_factors lf ON lower(lf.fuel) = lower(d.fuel)
GROUP BY d.state
""").write.mode("overwrite").saveAsTable("mart_generation_by_state")
# the warehouse is read-only — a write into it is refused, not silently dropped
conn.sql("SELECT 1 AS x").write.saveAsTable("warehouse.mart.nope") # -> PermissionError
Naming: a table can be addressed 3-part (warehouse.mart.fct_summary, from anywhere), 2-part
(mart.fct_summary, in the current catalog) or bare (fct_summary, in the current catalog + schema).
conn.catalog.setCurrentCatalog(name) / setCurrentDatabase(db) move where unqualified names
resolve. name is derived from a friendly path and is mandatory for a GUID-only OneLake path; one
URL maps to one name (re-attaching either raises). read_only is per-catalog, independent of the
session — a read-only reference store sits safely next to a writable lakehouse. Raw conn.sql() DML
targets the current catalog; cross-catalog writes go through the DataFrame API
(df.write.saveAsTable("cat.schema.t")) or DeltaTable.forName(conn, "cat.schema.t").
See the full runnable walkthrough in
tests/integration_tests/multicatalog/demo_multicatalog.py
(published as a live report).
Raw SQL DML through conn.sql¶
conn.sql doesn't only read — raw SQL DML against a discovered (Delta-backed) table is intercepted
and applied via delta_rs only, then the view is refreshed, so it works identically on a local
path and on OneLake. The invariant: every CREATE TABLE is Delta-backed; only CREATE TEMP TABLE
and CREATE VIEW stay native DuckDB (ephemeral, session-local scratch). Forms that delta_rs can't
express are rejected up front with a pointer to the write API, rather than failing cryptically.
| Statement | What happens |
|---|---|
CREATE [OR REPLACE] TABLE x [IF NOT EXISTS] AS <query> |
Delta overwrite (<query> = a select, a WITH … select, or (select …)); IF NOT EXISTS over a live table is a no-op |
CREATE TABLE x (<col defs>) |
empty Delta table |
INSERT INTO x [(cols)] SELECT/VALUES … |
Delta append — columns matched by name, projected/cast onto the target schema, unsupplied columns filled with typed NULL |
[WITH …] INSERT INTO x SELECT … |
Delta append (the CTE is re-attached to the body) |
UPDATE x SET … [WHERE …] |
delta_rs update |
DELETE FROM x [WHERE …] |
delta_rs delete |
ALTER TABLE x ADD COLUMN … |
Delta overwrite, widening the schema |
DROP TABLE x |
tombstone — marks the table dropped (a one-column marker) without deleting data; files persist for a human to purge, a later create … as revives it |
MERGE INTO x [a] USING s [b] ON a.k = b.k WHEN … |
delta_rs upsert (same engine + snapshot pin as the DeltaTable.merge builder). Write it like standard SQL — the ON/WHEN clauses may use your own aliases or the table/relation names (the literal target/source also work); fully-unqualified columns (ON k = k) are ambiguous and unsupported. Supports UPDATE SET * / UPDATE SET col = <src>.col, INSERT *, WHEN NOT MATCHED BY SOURCE THEN DELETE, and per-clause AND predicates |
CREATE TEMP/TEMPORARY TABLE …, CREATE VIEW … |
native DuckDB — ephemeral, session-local; not a Delta artifact |
UPDATE … FROM, DELETE … USING |
rejected → rewrite as a correlated subquery, or use DeltaTable.forName(conn, name) |
| multiple statements in one call | rejected → one statement per conn.sql() |
Leading -- / /* … */ comments are fine. The exact behaviour is pinned, statement-by-statement,
in tests/connection_api/test_connection_api.py
(the cross-API write-correctness matrix + the TestSqlDml class).
The card below — every public method with a ✅ — is regenerated on every push by
the connection-card job in cores.yml from the Test* classes of
tests/connection_api/test_connection_api.py.
duckrun connection API — method scorecard¶
┌───────────────────────────┐
│ ✅ 84 passed ❌ 0 failed │
│ 84 methods · 100% passing │
└───────────────────────────┘
DataFrame API — 66/66 ✅¶
Methods that mirror the established DataFrame / Delta
DeltaTableAPI 1:1.
| Surface | Methods | Pass |
|---|---|---|
DuckSession |
sql, table, read, catalog |
4/4 ✅ |
Catalog |
listTables, listDatabases, currentDatabase, setCurrentDatabase, tableExists, tableExists_is_fresh, databaseExists, listColumns, refreshTable, createTable_ddl, createTable_from_struct, createTable_bad_schema, getTable, getDatabase, dropTempView, listCatalogs, currentCatalog, setCurrentCatalog |
18/18 ✅ |
DataFrame |
collect, count, columns, show, toPandas, toArrow, first, head, take, isEmpty, schema, printSchema |
12/12 ✅ |
DataFrameReader |
format/load, table, parquet, csv, json, schema_csv_ddl, schema_ddl_with_comma_type, schema_json_struct, schema_rejected_for_delta, versionAsOf, timestampAsOf_rejected |
11/11 ✅ |
DataFrameWriter |
saveAsTable, mode, option, insertInto, insertInto_requires_existing, partitionBy, format, save_by_path, save_modes, save_mode_error_when_exists |
10/10 ✅ |
DeltaTable |
forName, forPath, convertToDelta, merge, version, history, delete, update, optimize, vacuum, restoreToVersion |
11/11 ✅ |
duckrun-specific helpers — 18/18 ✅¶
Conveniences with no DataFrame-API equivalent (session plumbing + two shortcuts).
| Method | Surface | Pass |
|---|---|---|
connect |
DuckSession |
✅ |
createDataFrame |
DuckSession |
✅ |
refresh |
DuckSession |
✅ |
connection |
DuckSession |
✅ |
stop |
DuckSession |
✅ |
table_path |
DuckSession |
✅ |
attach |
DuckSession |
✅ |
__getattr__ |
DataFrame |
✅ |
SELECT (passthrough) |
sql() |
✅ |
version-pinned read |
sql() |
✅ |
create table as |
sql() |
✅ |
insert…select |
sql() |
✅ |
insert…values |
sql() |
✅ |
update |
sql() |
✅ |
delete |
sql() |
✅ |
alter add column |
sql() |
✅ |
drop (tombstone) |
sql() |
✅ |
merge |
sql() |
✅ |