Skip to content

Connection API (notebook)

Besides the dbt adapter, duckrun ships a storage-neutral, DataFrame-style duckrun.connect() for interactive/notebook use (local, S3, GCS, ADLS, OneLake):

  • conn.sql(...) — DuckDB SQL over the discovered Delta tables, including time travel (delta_scan('…', version => N)). Reads pass straight through; raw DML (create table … as, insert, update, delete, alter add column, drop, merge) is applied to the Delta table via delta_rs (works local AND on OneLake) — see the DML matrix below.
  • a DataFrame with a DataFrame-style .write…saveAsTable() — modes overwrite / append / append_if_unchanged / overwrite_if_unchanged / ignore (the _if_unchanged modes are the fenced, fail-loud siblings; safeappend is the deprecated alias for append_if_unchanged), plus option("replaceWhere", …) for an atomic slice overwrite — plus conn.read and conn.catalog.
  • a DeltaTable handle (DeltaTable.forName(conn, name)) mirroring the DeltaTable API: .merge(...), .delete(), .update(), .version(), .history().
  • multiple catalogs: connect() binds one lakehouse root (the primary catalog); attach more with conn.attach(path, name=…) and read/join across them by three-part catalog.schema.table name — see Multiple catalogs with conn.attach below.

connect() is read-only by default: every Delta write (saveAsTable / insertInto / save / merge / insert / update / delete / replaceWhere) raises PermissionError, so an accidental write can't mutate a shared lakehouse. Pass read_only=False to enable writes; reads and native CREATE TEMP/CREATE VIEW scratch are always allowed.

For a method-by-method map of this surface against PySpark / Delta-on-Spark — what maps 1:1, what's duckrun-flavored, and what's deliberately out of scope (SQL-first, no Spark runtime — by design) — see Coverage vs the Spark / Delta API.

merge is snapshot-pinned by default — single-snapshot MERGE, with no extra arguments: the target version is captured and the commit validates against it, so a concurrent writer fails the commit loudly instead of silently interleaving. mode("append_if_unchanged") (alias: safeappend) and mode("overwrite_if_unchanged") apply the same fail-loud compare-and-swap to a plain append / full overwrite: they commit only if the table is unchanged since the version read, else raise CommitFailedError. The full model and a cross-engine comparison are in Snapshot isolation.

import duckrun
# writable session — read_only=False opts in (the default is read-only)
conn = duckrun.connect("abfss://ws@onelake.dfs.fabric.microsoft.com/lh.Lakehouse/Tables/dbo",
                       read_only=False)
conn.sql("select * from orders").write.mode("overwrite").saveAsTable("orders_copy")
conn.table("orders_copy").show()

from duckrun import DeltaTable
DeltaTable.forName(conn, "orders").delete("region = 'eu'")   # delete / update / version

# overwrite just one slice, atomically
conn.sql("select * from corrections").write.option("replaceWhere", "region = 'eu'") \
    .mode("overwrite").saveAsTable("orders")

src = conn.sql("select * from updates")
DeltaTable.forName(conn, "orders").merge(src, "target.id = source.id") \
    .whenMatchedUpdateAll().whenNotMatchedInsertAll().execute()   # pinned automatically

In-memory data with conn.createDataFrame

conn.createDataFrame(data, schema=None) turns in-memory data into a DataFrame on duckrun's own connection — handy for seeding, demos, or persisting a small Python/pandas result to Delta. data is a list of tuples/lists (a list of scalars becomes one column), a pandas DataFrame, or a pyarrow Table/RecordBatchReader. schema is None (names inferred — _1, _2, … for tuples), a list of column names, or a DDL string ("id int, name string"; name: type is also accepted). Empty data needs a DDL schema. It's a plain DuckDB build — no Spark/PySpark dependency.

conn.createDataFrame([(1, "a"), (2, "b")], "id int, name string") \
    .write.mode("overwrite").saveAsTable("seeded")

conn.createDataFrame(pandas_df).show()
conn.createDataFrame([], "id int, name string")   # typed empty frame

Multiple catalogs with conn.attach

connect() binds one lakehouse root as the primary catalog; conn.attach(path, name=…) binds more, so a single session can read and join across several lakehouses by three-part name (catalog.schema.table). In Microsoft Fabric a Warehouse and a Lakehouse are the same thing to duckrun — both are Delta in OneLake — except a Warehouse is locked to writes, so attach it read_only=True and keep the lakehouse writable. On OneLake tokens are automatic (nothing to pass in a Fabric notebook).

import duckrun

# primary catalog: a writable lakehouse. name= reads next to the others (else derived from the URL).
conn = duckrun.connect(
    "abfss://ws@onelake.dfs.fabric.microsoft.com/lakehouse.Lakehouse/Tables",
    read_only=False, name="lakehouse")

# attach a read-only Fabric Warehouse and a plain local folder as more catalogs
conn.attach("abfss://ws@onelake.dfs.fabric.microsoft.com/warehouse.Warehouse/Tables",
            name="warehouse", schema="mart", read_only=True)
conn.attach("/data/reference", name="local")

conn.catalog.listCatalogs()          # ['lakehouse', 'warehouse', 'local']

# join across all three in one query — facts from the warehouse, dim from the lakehouse,
# a lookup from local — and write the mart back to the (writable) lakehouse
conn.sql("""
    SELECT d.state, sum(f.mw) AS total_mw
    FROM warehouse.mart.fct_summary f
    JOIN mart.dim_duid d                ON d.duid = f.duid
    LEFT JOIN local.dbo.fuel_factors lf ON lower(lf.fuel) = lower(d.fuel)
    GROUP BY d.state
""").write.mode("overwrite").saveAsTable("mart_generation_by_state")

# the warehouse is read-only — a write into it is refused, not silently dropped
conn.sql("SELECT 1 AS x").write.saveAsTable("warehouse.mart.nope")   # -> PermissionError

Naming: a table can be addressed 3-part (warehouse.mart.fct_summary, from anywhere), 2-part (mart.fct_summary, in the current catalog) or bare (fct_summary, in the current catalog + schema). conn.catalog.setCurrentCatalog(name) / setCurrentDatabase(db) move where unqualified names resolve. name is derived from a friendly path and is mandatory for a GUID-only OneLake path; one URL maps to one name (re-attaching either raises). read_only is per-catalog, independent of the session — a read-only reference store sits safely next to a writable lakehouse. Raw conn.sql() DML targets the current catalog; cross-catalog writes go through the DataFrame API (df.write.saveAsTable("cat.schema.t")) or DeltaTable.forName(conn, "cat.schema.t").

See the full runnable walkthrough in tests/integration_tests/multicatalog/demo_multicatalog.py (published as a live report).

Raw SQL DML through conn.sql

conn.sql doesn't only read — raw SQL DML against a discovered (Delta-backed) table is intercepted and applied via delta_rs only, then the view is refreshed, so it works identically on a local path and on OneLake. The invariant: every CREATE TABLE is Delta-backed; only CREATE TEMP TABLE and CREATE VIEW stay native DuckDB (ephemeral, session-local scratch). Forms that delta_rs can't express are rejected up front with a pointer to the write API, rather than failing cryptically.

Statement What happens
CREATE [OR REPLACE] TABLE x [IF NOT EXISTS] AS <query> Delta overwrite (<query> = a select, a WITH … select, or (select …)); IF NOT EXISTS over a live table is a no-op
CREATE TABLE x (<col defs>) empty Delta table
INSERT INTO x [(cols)] SELECT/VALUES … Delta append — columns matched by name, projected/cast onto the target schema, unsupplied columns filled with typed NULL
[WITH …] INSERT INTO x SELECT … Delta append (the CTE is re-attached to the body)
UPDATE x SET … [WHERE …] delta_rs update
DELETE FROM x [WHERE …] delta_rs delete
ALTER TABLE x ADD COLUMN … Delta overwrite, widening the schema
DROP TABLE x tombstone — marks the table dropped (a one-column marker) without deleting data; files persist for a human to purge, a later create … as revives it
MERGE INTO x [a] USING s [b] ON a.k = b.k WHEN … delta_rs upsert (same engine + snapshot pin as the DeltaTable.merge builder). Write it like standard SQL — the ON/WHEN clauses may use your own aliases or the table/relation names (the literal target/source also work); fully-unqualified columns (ON k = k) are ambiguous and unsupported. Supports UPDATE SET * / UPDATE SET col = <src>.col, INSERT *, WHEN NOT MATCHED BY SOURCE THEN DELETE, and per-clause AND predicates
CREATE TEMP/TEMPORARY TABLE …, CREATE VIEW … native DuckDB — ephemeral, session-local; not a Delta artifact
UPDATE … FROM, DELETE … USING rejected → rewrite as a correlated subquery, or use DeltaTable.forName(conn, name)
multiple statements in one call rejected → one statement per conn.sql()

Leading -- / /* … */ comments are fine. The exact behaviour is pinned, statement-by-statement, in tests/connection_api/test_connection_api.py (the cross-API write-correctness matrix + the TestSqlDml class).

The card below — every public method with a ✅ — is regenerated on every push by the connection-card job in cores.yml from the Test* classes of tests/connection_api/test_connection_api.py.

duckrun connection API — method scorecard

┌───────────────────────────┐
│ ✅ 84 passed   ❌ 0 failed  │
│ 84 methods · 100% passing │
└───────────────────────────┘

DataFrame API — 66/66 ✅

Methods that mirror the established DataFrame / Delta DeltaTable API 1:1.

Surface Methods Pass
DuckSession sql, table, read, catalog 4/4 ✅
Catalog listTables, listDatabases, currentDatabase, setCurrentDatabase, tableExists, tableExists_is_fresh, databaseExists, listColumns, refreshTable, createTable_ddl, createTable_from_struct, createTable_bad_schema, getTable, getDatabase, dropTempView, listCatalogs, currentCatalog, setCurrentCatalog 18/18 ✅
DataFrame collect, count, columns, show, toPandas, toArrow, first, head, take, isEmpty, schema, printSchema 12/12 ✅
DataFrameReader format/load, table, parquet, csv, json, schema_csv_ddl, schema_ddl_with_comma_type, schema_json_struct, schema_rejected_for_delta, versionAsOf, timestampAsOf_rejected 11/11 ✅
DataFrameWriter saveAsTable, mode, option, insertInto, insertInto_requires_existing, partitionBy, format, save_by_path, save_modes, save_mode_error_when_exists 10/10 ✅
DeltaTable forName, forPath, convertToDelta, merge, version, history, delete, update, optimize, vacuum, restoreToVersion 11/11 ✅

duckrun-specific helpers — 18/18 ✅

Conveniences with no DataFrame-API equivalent (session plumbing + two shortcuts).

Method Surface Pass
connect DuckSession
createDataFrame DuckSession
refresh DuckSession
connection DuckSession
stop DuckSession
table_path DuckSession
attach DuckSession
__getattr__ DataFrame
SELECT (passthrough) sql()
version-pinned read sql()
create table as sql()
insert…select sql()
insert…values sql()
update sql()
delete sql()
alter add column sql()
drop (tombstone) sql()
merge sql()