Welcome to the Daft blog

Join us as we explore innovative ways to handle multimodal datasets, optimize performance, and simplify your data workflows.

Scaling As-of Joins
Product
May 13, 2026

Scaling As-of Joins

How we built, broke, and re-built our ASOF joins — 6x faster, half the memory of pandas, and scaled to a distributed cluster.

Scaling As-of Joins
Product
May 13, 2026

Scaling As-of Joins

How we built, broke, and re-built our ASOF joins — 6x faster, half the memory of pandas, and scaled to a distributed cluster.

daft.VideoFile: Seek Lazily, Get Frames
Engineering
May 8, 2026

daft.VideoFile: Seek Lazily, Get Frames

daft.VideoFile decodes only the frames you need. Keyframes, time-sampled, or windowed seek, built for robotics datasets, dashcams, and moderation queues.

VLAs are dead, long live World Action Models - a summary of Jim Fan's Robotics End Game talk
Thought Leadership
May 8, 2026

VLAs are dead, long live World Action Models - a summary of Jim Fan's Robotics End Game talk

Jim Fan argues robotics will follow the exact LLM playbook - and VLAs are already being replaced by World Action Models.

What is physical AI - is it more than just hype?
Thought Leadership
May 5, 2026

What is physical AI - is it more than just hype?

Physical AI has become a real trend, but is there something real here or is it just hype?

Daft Extensions Featuring daft-h3: Native Rust Performance, Community Owned
Engineering
May 4, 2026

Daft Extensions Featuring daft-h3: Native Rust Performance, Community Owned

Daft now supports native extensions via Apache Arrow's C Data Interface. daft-h3 is the first community extension — 9 Rust-native H3 geospatial functions, 3–16x faster than Python UDFs.

Daft v0.7.10: 30 contributors, 41 new features, distributed asof joins
Engineering
May 2, 2026

Daft v0.7.10: 30 contributors, 41 new features, distributed asof joins

30 contributors shipped Daft v0.7.10 — the most participation in any Daft release to date. The result: 41 new features and functions across distributed joins, duplicate detection, temporal arithmetic,

Audio transcription at scale with daft.AudioFile
Engineering
April 29, 2026

Audio transcription at scale with daft.AudioFile

How to transcribe thousands of audio files with Whisper using daft.AudioFile — handling resampling, silence splitting, and worker-resident model loading without the boilerplate.

Image Embeddings: Tutorial & Examples
Engineering
April 27, 2026

Image Embeddings: Tutorial & Examples

Learn about the concept of image embeddings, their various use cases, and best practices for handling them in data processing workflows.

daft.File: Lazy Metadata Filters
Product
April 21, 2026

daft.File: Lazy Metadata Filters

Filter millions of files by path, size, and content type before opening any of them. Cheap operations first, expensive operations on the survivors.

PreviousPage 1 of 8Next
Get updates, contribute code, or say hi.
Daft Engineering Blog
Join us as we explore innovative ways to handle multimodal datasets, optimize performance, and simplify your data workflows.
Github Discussions Forums
join
GitHub logo
The Distributed Data Community Slack
join
Slack logo