Join us as we explore innovative ways to handle multimodal datasets, optimize performance, and simplify your data workflows.

How we built, broke, and re-built our ASOF joins — 6x faster, half the memory of pandas, and scaled to a distributed cluster.

How we built, broke, and re-built our ASOF joins — 6x faster, half the memory of pandas, and scaled to a distributed cluster.

daft.VideoFile decodes only the frames you need. Keyframes, time-sampled, or windowed seek, built for robotics datasets, dashcams, and moderation queues.

Jim Fan argues robotics will follow the exact LLM playbook - and VLAs are already being replaced by World Action Models.

Physical AI has become a real trend, but is there something real here or is it just hype?

Daft now supports native extensions via Apache Arrow's C Data Interface. daft-h3 is the first community extension — 9 Rust-native H3 geospatial functions, 3–16x faster than Python UDFs.

30 contributors shipped Daft v0.7.10 — the most participation in any Daft release to date. The result: 41 new features and functions across distributed joins, duplicate detection, temporal arithmetic,

How to transcribe thousands of audio files with Whisper using daft.AudioFile — handling resampling, silence splitting, and worker-resident model loading without the boilerplate.

Learn about the concept of image embeddings, their various use cases, and best practices for handling them in data processing workflows.

Filter millions of files by path, size, and content type before opening any of them. Cheap operations first, expensive operations on the survivors.