Welcome to the Daft blog

Join us as we explore innovative ways to handle multimodal datasets, optimize performance, and simplify your data workflows.

Multimodal Structured Outputs: Evaluating VLM Image Understanding at Scale
Engineering
December 2, 2025

Multimodal Structured Outputs: Evaluating VLM Image Understanding at Scale

Leveraging ablation for contrastive image understanding evaluation in Daft

Processing 99% of U.S. Caselaw for Under $1 in the Common Pile
Engineering
Case Studies
December 2, 2025

Processing 99% of U.S. Caselaw for Under $1 in the Common Pile

How Teraflop AI processed 7 million court documents and 40 million pages spanning 365 years of U.S. caselaw for under a dollar using Daft.

Cutting LLM Batch Inference Time in Half: Dynamic Prefix Bucketing at Scale
Engineering
November 4, 2025

Cutting LLM Batch Inference Time in Half: Dynamic Prefix Bucketing at Scale

Learn how Dynamic Prefix Bucketing reduces LLM batch inference time, improves throughput, and unlocks faster multimodal processing at scale.

Benchmarks for Multimodal AI: Spark, Ray Data, and Daft
Engineering
October 1, 2025

Benchmarks for Multimodal AI: Spark, Ray Data, and Daft

Multimodal AI workloads break traditional data engines. Daft ran 2-7x faster than Ray Data and 4-18x faster than Spark while finishing jobs reliably across audio, video, document, and image workloads.

Introducing Flotilla: Simplifying Multimodal Data Processing at Scale
Announcements
Engineering
October 1, 2025

Introducing Flotilla: Simplifying Multimodal Data Processing at Scale

Flotilla, Daft's new distributed engine, processes terabytes of multimodal data in a single query up to 18x faster than Spark and Ray Data, while running efficiently, reliably, and without manual tuning.

Exploring Daft's Local Execution: The Swordfish Engine
Engineering
September 30, 2025

Exploring Daft's Local Execution: The Swordfish Engine

Explore how Daft's Rust-powered engine executes DataFrame and SQL queries. Learn how Swordfish enables fast, streaming image processing at scale.

After the First Run
Engineering
September 24, 2025

After the First Run

Using Daft's observability tools to uncover performance pitfalls

Making GPUs Zoom (Part 1)
Engineering
September 10, 2025

Making GPUs Zoom (Part 1)

How Daft is approaching large-scale model inference with advanced GPU optimizations for faster multimodal AI workloads

End-to-End Distributed PDF Processing Pipeline
Engineering
September 3, 2025

End-to-End Distributed PDF Processing Pipeline

Build production-ready PDF processing pipelines with distributed computing, OCR, spatial analysis, and GPU embeddings

PreviousPage 2 of 3Next
Get updates, contribute code, or say hi.
Daft Engineering Blog
Join us as we explore innovative ways to handle multimodal datasets, optimize performance, and simplify your data workflows.
Github Discussions Forums
join
GitHub logo
The Distributed Data Community Slack
join
Slack logo