Welcome to the Daft blog

Join us as we explore innovative ways to handle multimodal datasets, optimize performance, and simplify your data workflows.

Product Engineering Announcements Team Company Thought Leadership Case Studies Tutorials Video

Product

March 23, 2026

GPU Inference with @daft.cls

Run GPU models on millions of rows without OOM. Real patterns from ByteDance, Essential AI, and more.

Benchmarks for Multimodal AI: Spark, Ray Data, and Daft

Engineering

October 1, 2025

Benchmarks for Multimodal AI: Spark, Ray Data, and Daft

Multimodal AI workloads break traditional data engines. Daft ran 2-7x faster than Ray Data and 4-18x faster than Spark while finishing jobs reliably across audio, video, document, and image workloads.

Introducing Flotilla: Simplifying Multimodal Data Processing at Scale

Announcements

Engineering

October 1, 2025

Introducing Flotilla: Simplifying Multimodal Data Processing at Scale

Flotilla, Daft's new distributed engine, processes terabytes of multimodal data in a single query up to 18x faster than Spark and Ray Data, while running efficiently, reliably, and without manual tuning.

Exploring Daft's Local Execution: The Swordfish Engine

Engineering

September 30, 2025

Exploring Daft's Local Execution: The Swordfish Engine

Explore how Daft's Rust-powered engine executes DataFrame and SQL queries. Learn how Swordfish enables fast, streaming image processing at scale.

Engineering

September 24, 2025

After the First Run

Using Daft's observability tools to uncover performance pitfalls

Engineering

September 10, 2025

Making GPUs Zoom (Part 1)

How Daft is approaching large-scale model inference with advanced GPU optimizations for faster multimodal AI workloads

End-to-End Distributed PDF Processing Pipeline

Engineering

September 3, 2025

End-to-End Distributed PDF Processing Pipeline

Build production-ready PDF processing pipelines with distributed computing, OCR, spatial analysis, and GPU embeddings

How to Build Scalable, End-to-end Batch Inference Pipelines with Daft

Engineering

August 26, 2025

How to Build Scalable, End-to-end Batch Inference Pipelines with Daft

Daft makes it easy to express these pipelines end-to-end, while seamlessly scaling them up to handle massive workloads.

24 Trillion Tokens, 0 Crashes: How Essential AI Built Essential-Web v1.0 with Daft

Case Studies

August 20, 2025

24 Trillion Tokens, 0 Crashes: How Essential AI Built Essential-Web v1.0 with Daft

Essential AI leveraged Daft's data engine to process a massive web-scale dataset for large language model (LLM) training.

Embedding Millions of Text Documents With Qwen3

Engineering

Video

August 13, 2025

Embedding Millions of Text Documents With Qwen3

Learn how to achieve near-100% GPU utilization processing millions of text documents with Qwen3 embeddings.

PreviousPage 4 of 6Next

Get updates, contribute code, or say hi.

Daft Engineering Blog

Join us as we explore innovative ways to handle multimodal datasets, optimize performance, and simplify your data workflows.

Github Discussions Forums

join

The Distributed Data Community Slack

join