
Daft v0.7.4: Arrow-rs, OpenDAL, Flight Shuffle, and Better Metrics
Daft v0.7.4 completes its arrow-rs migration, adds Apache OpenDAL storage support, Flight shuffle for Flotilla, and a full observability stack.
by Daft TeamOver the past several months, the team has been migrating core rust kernels from arrow2 to arrow-rs. With release 0.7.4, over two thousand arrow2 call-sites have been meticulously converted, bringing to a close a colossal effort by the team. This release also brings a full observability stack, Apache OpenDAL support, and quality-of-life improvements across metrics, storage backends, and SQL compatibility.
The Great Arrow-rs Migration
The most significant change in v0.7.4 is Daft's migration from arrow2 to arrow-rs, the official Apache Arrow Rust implementation.
Why this matters: "arrow2 was a high-quality Rust implementation of Arrow, but it's no longer actively maintained. arrow-rs, on the other hand, is the Apache Software Foundation's official project, backed by a large contributor community, regular releases, and deep integration with the broader Rust data ecosystem (DataFusion, Ballista, and more)."
This release alone includes 25 of 120+ total PRs spanning everything from core data types to kernel implementations:
- Arithmetic kernels, cast operations, and concat logic moved to arrow-rs
- Boolean bitmap access, array serialization, and filtering migrated
- Hash kernels, growable internals, and comparison operations ported
- The
daft-arrowcompatibility layer is being systematically removed
Breaking change: Interval arithmetic now uses arrow-rs (#6186). If you're using interval types directly, check the migration notes.
The arrow2-to-arrow-rs migration has been underway since December and is now effectively complete. Cory Grinstead (@universalmind303) led the charge, authoring over half of the 122 total migration PRs.
Here are the full numbers:
| Metric | Value |
|---|---|
| Total migration PRs | 122 |
| Total lines added | +20,934 |
| Total lines deleted | -17,916 |
| Total lines changed | 38,850 |
| Net change | +3,018 |
Per-contributor breakdown:
| Contributor | PRs | Lines Added | Lines Deleted | % of PRs |
|---|---|---|---|---|
| @universalmind303 | 62 | 13,071 | 8,624 | 50.8% |
| @desmondcheongzx | 16 | 1,784 | 1,771 | 13.1% |
| @srilman | 11 | 3,234 | 2,920 | 9.0% |
| @rohitkulshreshtha | 10 | 344 | 569 | 8.2% |
| @kevinzwang | 8 | 896 | 3,012 | 6.6% |
| @cckellogg | 8 | 291 | 133 | 6.6% |
| @huleilei | 3 | 107 | 60 | 2.5% |
| @colin-ho | 3 | 54 | 52 | 2.5% |
| @rchowell | 1 | 1,153 | 775 | 0.8% |
Better Metrics for Observability
v0.7.4 continues the observability push that started in v0.7.3. Standardized metric naming and split duration columns mean you can now correlate per-node execution times across pipeline stages without writing custom parsing, the metrics DataFrame gives you what you need directly.
New in v0.7.4:
- New Metrics documentation (#6253) — comprehensive docs for Daft's metrics system
- Consolidated metric naming (#6236) — standardized metric names with
node.typeattributes for cleaner dashboards - Split duration metrics (#6235) — separate columns in the metrics DataFrame for easier analysis
- Dashboard CLI improvements (#6234) — split into
start/stopsubcommands for cleaner workflow
Combined with v0.7.3's OTEL export support, Flotilla metrics, and dashboard daemon mode, Daft now has a complete observability story: collect metrics, export to OpenTelemetry, and visualize in a built-in dashboard.
Apache OpenDAL: One API for Every Storage Backend
Daft now supports Apache OpenDAL compatible backends (#6177). OpenDAL provides a unified data access layer for dozens of storage services, S3, GCS, Azure Blob, HDFS, and many more, through a single API.
This means Daft can now read from and write to any storage backend that OpenDAL supports, without needing a dedicated connector for each one.
Flight Shuffle for Flotilla
Daft's distributed execution engine Flotilla gets a major upgrade with Flight shuffle support (#6123). Arrow Flight is a high-performance data transport protocol built on gRPC and Arrow IPC, it enables efficient, zero-copy data movement between nodes during shuffle operations.
This is a foundational piece for Flotilla's performance at scale, reducing serialization overhead during distributed data exchange.
More Highlights
- Tencent Cloud COS support (#6140) — native support for Tencent Cloud Object Storage, contributed by @XuQianJin-Stars
- pyiceberg 0.11.0 (#6200) — updated Iceberg integration to the latest pyiceberg release
.as_Tcast methods (#6100) — convenient type casting methods like.as_int(),.as_str()on expressions- SQL ORDER BY position (#6211) —
ORDER BY 1, 2now works as expected in Daft SQL - Time-interval sampling (#6088) — enhanced sampling with comprehensive time-interval support for audio/video workflows
Community Contributions
This release wouldn't be possible without contributions from the community:
- @huleilei — time-interval sampling enhancements
- @XuQianJin-Stars — Tencent Cloud COS support
- @gweaverbiodev —
pyiceberg 0.11.0support - @Lucas61000 —
SQL ORDER BYcolumn position - @plotor — dashboard daemon mode
- @gpathak128 — JSON timestamp write support
- @aaron-ang —
.as_Tcast methods
Upgrade today to 0.7.4
uv add "daft>=0.7.4"Or try the latest nightly:
uv pip install daft --pre --extra-index-url https://nightly.daft.aiCheck the full changelog for the complete list of changes.

