Back to Blog
February 18, 2026
Daft v0.7.3 release cover art

Daft v0.7.3: OTEL for Flotilla, Nightly Builds, and Lance NN Search

Daft v0.7.3 adds distributed observability with df.metrics via OTEL, nightly builds, and native Lance vector search.

by Daft Team

Daft version 0.7.3 has been released. Here are the highlights, honorable mentions, and contributor spotlights from the latest update.

Observability for distributed runs

This release brings big updates to Daft's observability roadmap. Last year, we shipped the first milestone: a metrics framework for the native runner (Swordfish) and a basic dashboard to track query lifecycles. v0.7.3 lands the second part, extending that same infrastructure to distributed execution.

Concretely, that means three things.

First, with @Jay-ju's help [#6062, #6063], the Daft dashboard can now monitor query state across your Ray cluster.

Second, df.metrics now works for Flotilla runs [#6122]. After your distributed query materializes, you get a RecordBatch of overall execution stats — operator timings, row counts, bytes processed — attached directly to the result DataFrame. Same API you'd use on a local run, same data shape, just from a cluster.

Additionally, Daft now officially supports the OpenTelemetry metrics protocol. Using standard OTEL_EXPORTER_OTLP_* environment variables [#6148], you can route metrics to your existing OTEL backend — Prometheus, OTEL collector, ClickStack — without touching any code.

The design philosophy here is worth noting: observability hooks are available across the stack, depending on how much detail you need. By default, query metrics are immediately available on your resulting DataFrame. Not a separate service, not a sidecar you have to deploy — it's just another property on the object you're already working with. If you're looking to track historical metrics, the OTEL integration uses the standard OTEL tooling that your infrastructure team already knows. And if you're looking for something more, the dashboard provides Daft-specific context to your runs.

Nightly builds, because we want you to break things faster

We now publish nightly builds at nightly.daft.ai [#6175]. pip install and you're running whatever landed on main yesterday.

Why does this matter? Because the feedback loop between "contributor lands a PR" and "real user tries it" was too long. If you found a bug, you had to wait for a release or build from source. If you wanted to validate a fix, same deal. Nightlies compress that loop to about 24 hours. Land a PR Monday afternoon, someone in Singapore is running it Tuesday morning.

It also keeps us honest. When main has to be installable every single day, you can't let broken stuff linger. The nightly pipeline is a forcing function for code quality — and an open invitation for the community to hold us to it.

pip install daft --pre --extra-index-url https://nightly.daft.ai

That's it. Live dangerously.

Lance vector search, natively in your DataFrame

We added Lance namespace read/write [#5980], nearest-neighbor vector search [#6025], and distance/similarity functions [#6098] — cosine similarity, Euclidean distance, the works — as native DataFrame expressions.

Think about what this means for a RAG pipeline: read your Lance dataset into Daft, run vector search, compute similarity, filter, join with metadata, write the results. One engine, distributed, optimized. The AI data stack doesn't need more tools. It needs fewer tools that do more.

Credit here to @shaofengshi and @huleilei who drove the Lance integration as community contributors.

Zero-copy arrays and smarter UDF serialization

from_vec is now zero-copy [#6172]. When Daft constructs arrays from Rust vectors — which happens constantly — it no longer copies the data. Just points to it.

Meanwhile, Actor UDFs now serialize only the columns they actually need [#5884]. If your UDF takes 2 columns from a 50-column DataFrame, we used to ship all 50. Now we ship 2. We also bumped Actor UDF timeouts from 10s to 60s [#6163], because it turns out calling an LLM endpoint takes longer than a hash join. Who knew.

These are the kind of changes that don't make for good demos but show up immediately in your pipeline's memory profile and wall-clock time.

Honorable mentions

v0.7.3 shipped a lot. Here's the rest of what you should know about:

  • The arrow2 → arrow-rs migration keeps going — 20+ PRs in this release alone migrated core kernels to arrow-rs: hashing, aggregations, UTF8 operations, temporal methods, null checks, HLL sketches, comparison kernels, and more. We're consolidating onto the Apache-maintained Arrow implementation, and the scope of this effort deserves its own story. Stay tuned.

  • Iceberg snapshot properties — You can now set custom snapshot properties on Iceberg writes [#6139]. If you're tracking lineage, audit trails, or pipeline metadata in production Iceberg tables, this one's for you.

  • Unity OAuth + Gravitino connector — Two catalog integrations in one release. M2M OAuth for Databricks Unity Catalog [#5839], plus a new connector for Apache Gravitino [#6083]. Daft talks to wherever your data already lives.

  • Expression API expansionuuid() [#5983], agg_concat with custom delimiters [#6099], list_contains [#6095], text embedding dimension specification [#6097], variance with degrees of freedom [#6105], and comparison operators for list and struct types [#6104]. Each one is a case where you used to need a UDF and now you don't.

  • Bug fixes worth knowing about — Map columns now render as Python dicts instead of lists [#6198]. is_in accepts sets, tuples, and iterables, not just lists [#6115]. Filter pushdown works through anti-joins [#6150]. into_partitions() handles the case where input already matches the target partition count [#6061].

Contributor Spotlight

Since January 1st, 8 community contributors shipped 34 PRs.

Here's the rundown:

  • Aaron Ang (@aaron-ang) — 10 PRs. An absolute machine. Distance functions, similarity functions, .as_T cast methods, list_contains, string casing, agg_concat delimiter, embedding dimension spec. He basically built a new feature wing of the API by himself.

  • huleilei (@huleilei, ByteDance) — 9 PRs. Lance vector search, arrow2 migration work on is_in/get_lit, UDF v2 kwargs fix, image pipeline docs. Shipping across multiple areas of the codebase.

  • jay (@Jay-ju, ByteDance) — 4 PRs. Extended the Daft dashboard for distributed execution. Also added map_groups v2 UDF support and mcap reader improvements.

  • Zhenchao Wang (@plotor, ByteDance) — 3 PRs. Perf optimization for actor UDF serialization, build tooling improvements.

  • everySympathy (@everySympathy, ByteDance) — 3 PRs. Added the uuid() function, fixed into_partitions(), docs cleanup.

  • Shaofeng Shi (@shaofengshi, Datastrato) — 2 PRs. Landed the Gravitino connector optional dependency and Lance namespace read/write support.

  • gpathak128 (@gpathak128) — 2 PRs. First time contributing to Daft! JSON write: ignore null fields + timestamp support.

  • Killua7163 (@Killua7163) — 1 PR. Also a first-timer! Fixed mypy-boto3-glue in the AWS optional dependencies.

Daft continues to release powerful features thanks to our amazing contributors. It's this level of collective investment that makes Daft what it is.

Try 0.7.3 today

pip install daft==0.7.3


Full changelog is on GitHub. And if any of the work above made you think "I could contribute to that", you absolutely can. Grab a good first issue and come build with us.

Get updates, contribute code, or say hi.
Daft Engineering Blog
Join us as we explore innovative ways to handle multimodal datasets, optimize performance, and simplify your data workflows.
Github Discussions Forums
join
GitHub logo
The Distributed Data Community Slack
join
Slack logo