Welcome to the Daft blog

Join us as we explore innovative ways to handle multimodal datasets, optimize performance, and simplify your data workflows.

Engineering
December 15, 2025

How We Use AI Coding Agents

Our engineering team's best practices for working with AI coding agents.

Case Studies
Engineering
December 11, 2025

How Sourcetable Built the World's First AI Spreadsheet with Daft

Sourcetable CTO Andy Grosser discusses their data infrastructure choices and why reliability and scale drove their architecture decisions.

Engineering
Case Studies
December 2, 2025

Processing 99% of U.S. Caselaw for Under $1 in the Common Pile

How Teraflop AI processed 7 million court documents and 40 million pages spanning 365 years of U.S. caselaw for under a dollar using Daft.

Engineering
December 2, 2025

Multimodal Structured Outputs: Evaluating VLM Image Understanding at Scale

Leveraging ablation for contrastive image understanding evaluation in Daft

Agentic systems are just query engines for unstructured data
Engineering
November 12, 2025

Agentic systems are just query engines for unstructured data

A systems engineer’s view of the new AI stack

Engineering
November 4, 2025

Cutting LLM Batch Inference Time in Half: Dynamic Prefix Bucketing at Scale

A new inference backend that maximizes batch inference throughput.

Engineering
October 1, 2025

Benchmarks for Multimodal AI Workloads

Spark, Ray Data, and Daft

Announcements
Engineering
October 1, 2025

Introducing Flotilla: Simplifying Multimodal Data Processing at Scale

Daft's new distributed engine

Engineering
September 30, 2025

Exploring Daft's Local Execution

The Swordfish Engine

Engineering
September 24, 2025

After the First Run

Using Daft’s observability tools to uncover performance pitfalls

Engineering
September 10, 2025

Making GPUs Zoom (Part 1)

A deep dive into GPU optimizations for production-scale multimodal data processing

Engineering
September 3, 2025

End-to-End Distributed PDF Processing Pipeline

OCR, Spatial Analysis & GPU Embeddings with Python

PreviousPage 1 of 2Next
Get updates, contribute code, or say hi.
Daft Engineering Blog
Join us as we explore innovative ways to handle multimodal datasets, optimize performance, and simplify your data workflows.
Github Discussions Forums
join
GitHub logo
The Distributed Data Community Slack
join
Slack logo