Back to Blog
February 26, 2026

Daft v0.7.4: Arrow-rs, OpenDAL, Flight Shuffle, and Better Metrics

Another killer release featuring arrow-rs migration, Apache OpenDAL support, Flight shuffle, better metrics, and Tencent Cloud COS integration.

by Daft Team

Over the past several months, the team has been migrating core rust kernels from arrow2 to arrow-rs. With release 0.7.4, over two thousand arrow2 call-sites have been meticulously converted, bringing to a close a colossal effort by the team. This release also brings a full observability stack, Apache OpenDAL support, and quality-of-life improvements across metrics, storage backends, and SQL compatibility.

The Great Arrow-rs Migration

The most significant change in v0.7.4 is Daft's migration from arrow2 to arrow-rs, the official Apache Arrow Rust implementation.

Why this matters: arrow2 was a high-quality Rust implementation of Arrow, but it's no longer actively maintained. arrow-rs, on the other hand, is the Apache Software Foundation's official project, backed by a large contributor community, regular releases, and deep integration with the broader Rust data ecosystem (DataFusion, Ballista, and more).

This release alone includes 25 of 120+ total PRs spanning everything from core data types to kernel implementations:

  • Arithmetic kernels, cast operations, and concat logic moved to arrow-rs

  • Boolean bitmap access, array serialization, and filtering migrated

  • Hash kernels, growable internals, and comparison operations ported

  • The `daft-arrow` compatibility layer is being systematically removed

Breaking change: Interval arithmetic now uses arrow-rs (#6186). If you're using interval types directly, check the migration notes.

The arrow2-to-arrow-rs migration has been underway since December and is now effectively complete. Cory Grinstead (@universalmind303) led the charge, authoring over half of the 122 total migration PRs.


Here are the full numbers:

Metric

Value

Total migration PRs

122

Total lines added

+20,934

Total lines deleted

-17,916

Total lines changed

38,850

Net change

+3,018

Per-contributor breakdown:

Contributor

PRs

Lines Added

Lines Deleted

% of PRs

@universalmind303

62

13,071

8,624

50.8%

@desmondcheongzx

16

1,784

1,771

13.1%

@srilman

11

3,234

2,920

9.0%

@rohitkulshreshtha

10

344

569

8.2%

@kevinzwang

8

896

3,012

6.6%

@cckellogg

8

291

133

6.6%

@huleilei

3

107

60

2.5%

@colin-ho

3

54

52

2.5%

@rchowell

1

1,153

775

0.8%

Better Metrics for Observability

v0.7.4 continues the observability push that started in v0.7.3. Standardized metric naming and split duration columns mean you can now correlate per-node execution times across pipeline stages without writing custom parsing, the metrics DataFrame gives you what you need directly.

New in v0.7.4:

  • New Metrics documentation (#6253) — comprehensive docs for Daft's metrics system

  • Consolidated metric naming (#6236) — standardized metric names with `node.type` attributes for cleaner dashboards

  • Split duration metrics (#6235) — separate columns in the metrics DataFrame for easier analysis

  • Dashboard CLI improvements (#6234) — split into start/stop subcommands for cleaner workflow

Combined with v0.7.3's OTEL export support, Flotilla metrics, and dashboard daemon mode, Daft now has a complete observability story: collect metrics, export to OpenTelemetry, and visualize in a built-in dashboard.

Apache OpenDAL: One API for Every Storage Backend

Daft now supports Apache OpenDAL compatible backends (#6177). OpenDAL provides a unified data access layer for dozens of storage services, S3, GCS, Azure Blob, HDFS, and many more, through a single API.

This means Daft can now read from and write to any storage backend that OpenDAL supports, without needing a dedicated connector for each one.

Flight Shuffle for Flotilla

Daft's distributed execution engine Flotilla gets a major upgrade with Flight shuffle support (#6123). Arrow Flight is a high-performance data transport protocol built on gRPC and Arrow IPC, it enables efficient, zero-copy data movement between nodes during shuffle operations.

This is a foundational piece for Flotilla's performance at scale, reducing serialization overhead during distributed data exchange.

More Highlights

  • Tencent Cloud COS support (#6140) — native support for Tencent Cloud Object Storage, contributed by @XuQianJin-Stars

  • pyiceberg 0.11.0 (#6200) — updated Iceberg integration to the latest pyiceberg release

  • .as_T cast methods (#6100) — convenient type casting methods like .as_int(), .as_str() on expressions

  • SQL ORDER BY position (#6211) — ORDER BY 1, 2 now works as expected in Daft SQL

  • Time-interval sampling (#6088) — enhanced sampling with comprehensive time-interval support for audio/video workflows

Community Contributions

This release wouldn't be possible without contributions from the community:

  • @huleilei — time-interval sampling enhancements

  • @XuQianJin-Stars — Tencent Cloud COS support

  • @gweaverbiodevpyiceberg 0.11.0 support

  • @Lucas61000SQL ORDER BY column position

  • @plotor — dashboard daemon mode

  • @gpathak128 — JSON timestamp write support

  • @aaron-ang.as_T cast methods

Upgrade today to 0.7.4

uv add "daft>=0.7.4"

Or try the latest nightly:

uv pip install daft --pre --extra-index-url https://nightly.daft.ai

Check the full changelog for the complete list of changes.

Get updates, contribute code, or say hi.
Daft Engineering Blog
Join us as we explore innovative ways to handle multimodal datasets, optimize performance, and simplify your data workflows.
Github Discussions Forums
join
GitHub logo
The Distributed Data Community Slack
join
Slack logo