
Daft Extensions Featuring daft-h3: Native Rust Performance, Community Owned
Daft now supports native extensions via Apache Arrow's C Data Interface. daft-h3 is the first community extension — 9 Rust-native H3 geospatial functions, 3–16x faster than Python UDFs.
by Daft TeamDevelopers can now extend Daft in any language with an Arrow implementation, making domain-specific functionality (geo, ml, …) easier to ship as a community package.
daft-h3 brings Uber H3 geospatial indexing to Daft as a Rust-native extension. Nine functions, pip install daft-h3, 3–16x faster than wrapping h3-py in a batch UDF. Built independently by long time community contributor Garrett Weaver using Daft's extension authoring guide.
daft-h3 isn't alone. Other community extensions already in the wild include daft-html (HTML parsing and CSS selector extraction), daft-geo (extension-backed datatypes like Point2D and Point3D), and daft-lance (Lance-specific distributed operations). Today we're formalizing the pattern as community extensions — see discussion #6852 for the full announcement and ongoing community conversation.
Why Extensions
Daft is the data engine for AI, and AI workloads are inherently domain-specific. Geospatial indexing, HTML parsing, vector distance functions, model inference, file compaction — these are all valuable, but none of them belong in the core engine. Extensions give contributors a way to build and own this functionality while still benefiting from Daft's distributed execution model.
If you've used Daft's built-in AI functions — prompt, embed_text, classify_image — you've already used this pattern. Those functions are built on Daft's UDF machinery, with batching, concurrency controls, retries, and GPU resource hints all handled by the engine. The same machinery is available to anyone building an extension.
How Daft Extensions Work
There are two paths. Python UDF extensions are the fastest to ship — package @daft.func / @daft.cls decorators behind clean Python APIs, the way daft-lance does for Lance compaction and indexing. Native ABI extensions go lower-level for vectorized performance through a stable C ABI. daft-h3 is the latter.
Daft extensions use a stable C ABI based on the Arrow C Data Interface. The boundary between Daft and your extension is plain C structs — ArrowSchema and ArrowArray — so your extension is not coupled to any particular arrow-rs version. You pick your arrow-rs version, enable the matching feature flag on the daft-ext crate (arrow-56, arrow-57, or arrow-58), and get safe .into() conversions for free.
The authoring pattern:
- Rust: Implement
DaftScalarFunction— definename(),return_field()for type checking, andcall()for evaluation. Register functions in aninstall()hook via#[daft_extension]. - Python: Write thin wrappers that call
daft.get_function("your_fn", *args)— no PyO3 needed. The Python layer gives you autocomplete, type hints, and docstrings. - Ship:
setuptools-rustwithBinding::NoBindingcompiles the cdylib. Publish to PyPI. Users load it withdaft.load_extension()— links the library to the default global session, no explicitSessionobject needed.
All data flows through Arrow arrays at the C ABI boundary. No per-row Python overhead. No serialization. The extension runs at native Rust speed inside Daft's query engine.
Rust has the most ergonomic SDK today via daft-ext. C++ works through the raw ABI — see examples/hello_cpp. The ABI is Arrow's C Data Interface, so any language with an Arrow implementation can ship an extension.
daft-h3: Community-Built, Rust-Native
daft-h3 is the first extension built entirely outside the Daft org. It wraps the H3 geospatial indexing system — the same hexagonal grid Uber uses for surge pricing and ETAs.
import daft
import daft_h3
from daft import col
daft.load_extension(daft_h3)
df = daft.from_pydict({"lat": [37.7749, 48.8566], "lng": [-122.4194, 2.3522]})
df = df.select(
daft_h3.h3_latlng_to_cell(col("lat"), col("lng"), 7).alias("cell")
).collect()
df = df.select(daft_h3.h3_cell_to_str(col("cell")).alias("hex")).collect()
df.show()daft.load_extension() links the library into the default global session. If you need function isolation across sessions (e.g., testing), you can call sess.load_extension() on an explicit Session object — but for most scripts and notebooks the top-level call is all you need.
Nine functions cover the core H3 workflow: coordinate-to-cell conversion, cell properties, parent traversal, and grid distance. All cell-input functions accept both UInt64 and Utf8 columns — string H3 data works without explicit conversion.
The performance gap matters. On 1M rows (Apple M-series):
| Function | daft-h3 | h3-py UDF | Speedup |
|---|---|---|---|
latlng_to_cell | 416 ms | 1,407 ms | 3.4x |
cell_to_lat | 242 ms | 1,011 ms | 4.2x |
cell_parent | 75 ms | 1,072 ms | 14.4x |
str_to_cell | 46 ms | 756 ms | 16.6x |
The speedup comes from bypassing Python entirely. A batch UDF still marshals data through Python — the native extension operates on Arrow arrays directly in Rust.
The Model: You Build It, You Own It
The extension system is designed for community ownership. We encourage contributors to own and maintain extensions in their own repositories — keeps development fast and close to the communities that need the functionality. Most extensions don't need to live under the daft-engine GitHub org. For projects with significant community usage, active maintenance, and clear ecosystem alignment, we're creating space in daft-engine to house community-led work over time.
daft-extprovides the stability contract. The C ABI doesn't change when Daft upgrades its internal arrow-rs version. Your extension keeps working.- You publish to PyPI. Naming convention:
daft-<domain>(e.g.,daft-h3). - You maintain it. Bug reports, releases, and versioning are yours. Daft lists community extensions in the Community Extensions registry — open a PR against that page to add yours.
Daft now powers everything from cloud-scale infrastructure like DeltaCat to experimental projects like Archetype. Extensions are how we keep that growth healthy without funneling every idea through the governance and review process of the core repo.
Examples to Learn From
examples/hello— minimal Rust extension registering agreetscalar functionexamples/dvector— pgvector-style vector distance functionsexamples/hello_cpp— pure C++ extension using Apache Arrow C++
The native extension authoring guide walks through the full Rust tutorial — project setup, implementation, Python wrappers, testing. There's even a Claude Code prompt at the bottom that scaffolds an entire extension from a function spec. For Python UDF extensions, start with the Python UDF docs.
If you've been wrapping a Rust library in a @daft.func.batch UDF and wishing it ran faster, the native path is the answer. Write the function once in Rust, ship it as pip install, and let it run at native speed inside the query engine. Have an idea? Open a discussion or join the conversation in #6852.

