
Daft v0.7.4: Arrow-rs, OpenDAL, Flight Shuffle, and Better Metrics
Another killer release featuring arrow-rs migration, Apache OpenDAL support, Flight shuffle, better metrics, and Tencent Cloud COS integration.
by Daft TeamOver the past several months, the team has been migrating core rust kernels from arrow2 to arrow-rs. With release 0.7.4, over two thousand arrow2 call-sites have been meticulously converted, bringing to a close a colossal effort by the team. This release also brings a full observability stack, Apache OpenDAL support, and quality-of-life improvements across metrics, storage backends, and SQL compatibility.
The Great Arrow-rs Migration
The most significant change in v0.7.4 is Daft's migration from arrow2 to arrow-rs, the official Apache Arrow Rust implementation.
Why this matters: arrow2 was a high-quality Rust implementation of Arrow, but it's no longer actively maintained. arrow-rs, on the other hand, is the Apache Software Foundation's official project, backed by a large contributor community, regular releases, and deep integration with the broader Rust data ecosystem (DataFusion, Ballista, and more).
This release alone includes 25 of 120+ total PRs spanning everything from core data types to kernel implementations:
- •
Arithmetic kernels, cast operations, and concat logic moved to arrow-rs
- •
Boolean bitmap access, array serialization, and filtering migrated
- •
Hash kernels, growable internals, and comparison operations ported
- •
The `daft-arrow` compatibility layer is being systematically removed
Breaking change: Interval arithmetic now uses arrow-rs (#6186). If you're using interval types directly, check the migration notes.
The arrow2-to-arrow-rs migration has been underway since December and is now effectively complete. Cory Grinstead (@universalmind303) led the charge, authoring over half of the 122 total migration PRs.
Here are the full numbers:
Metric | Value |
|---|---|
Total migration PRs | 122 |
Total lines added | +20,934 |
Total lines deleted | -17,916 |
Total lines changed | 38,850 |
Net change | +3,018 |
Per-contributor breakdown:
Contributor | PRs | Lines Added | Lines Deleted | % of PRs |
|---|---|---|---|---|
62 | 13,071 | 8,624 | 50.8% | |
16 | 1,784 | 1,771 | 13.1% | |
11 | 3,234 | 2,920 | 9.0% | |
10 | 344 | 569 | 8.2% | |
8 | 896 | 3,012 | 6.6% | |
8 | 291 | 133 | 6.6% | |
3 | 107 | 60 | 2.5% | |
3 | 54 | 52 | 2.5% | |
1 | 1,153 | 775 | 0.8% |
Better Metrics for Observability
v0.7.4 continues the observability push that started in v0.7.3. Standardized metric naming and split duration columns mean you can now correlate per-node execution times across pipeline stages without writing custom parsing, the metrics DataFrame gives you what you need directly.
New in v0.7.4:
- •
New Metrics documentation (#6253) — comprehensive docs for Daft's metrics system
- •
Consolidated metric naming (#6236) — standardized metric names with `node.type` attributes for cleaner dashboards
- •
Split duration metrics (#6235) — separate columns in the metrics DataFrame for easier analysis
- •
Dashboard CLI improvements (#6234) — split into
start/stopsubcommands for cleaner workflow
Combined with v0.7.3's OTEL export support, Flotilla metrics, and dashboard daemon mode, Daft now has a complete observability story: collect metrics, export to OpenTelemetry, and visualize in a built-in dashboard.
Apache OpenDAL: One API for Every Storage Backend
Daft now supports Apache OpenDAL compatible backends (#6177). OpenDAL provides a unified data access layer for dozens of storage services, S3, GCS, Azure Blob, HDFS, and many more, through a single API.
This means Daft can now read from and write to any storage backend that OpenDAL supports, without needing a dedicated connector for each one.
Flight Shuffle for Flotilla
Daft's distributed execution engine Flotilla gets a major upgrade with Flight shuffle support (#6123). Arrow Flight is a high-performance data transport protocol built on gRPC and Arrow IPC, it enables efficient, zero-copy data movement between nodes during shuffle operations.
This is a foundational piece for Flotilla's performance at scale, reducing serialization overhead during distributed data exchange.
More Highlights
- •
Tencent Cloud COS support (#6140) — native support for Tencent Cloud Object Storage, contributed by @XuQianJin-Stars
- •
pyiceberg 0.11.0 (#6200) — updated Iceberg integration to the latest pyiceberg release
- •
.as_Tcast methods (#6100) — convenient type casting methods like.as_int(),.as_str()on expressions - •
SQL ORDER BY position (#6211) —
ORDER BY 1, 2now works as expected in Daft SQL - •
Time-interval sampling (#6088) — enhanced sampling with comprehensive time-interval support for audio/video workflows
Community Contributions
This release wouldn't be possible without contributions from the community:
- •
@huleilei — time-interval sampling enhancements
- •
@XuQianJin-Stars — Tencent Cloud COS support
- •
@gweaverbiodev —
pyiceberg 0.11.0support - •
@Lucas61000 —
SQL ORDER BYcolumn position - •
@plotor — dashboard daemon mode
- •
@gpathak128 — JSON timestamp write support
- •
@aaron-ang —
.as_Tcast methods
Upgrade today to 0.7.4
uv add "daft>=0.7.4"
Or try the latest nightly:
uv pip install daft --pre --extra-index-url https://nightly.daft.ai
Check the full changelog for the complete list of changes.