Welcome to the Daft blog

Join us as we explore innovative ways to handle multimodal datasets, optimize performance, and simplify your data workflows.

How to Build Scalable, End-to-end Batch Inference Pipelines with Daft
Engineering
August 26, 2025

How to Build Scalable, End-to-end Batch Inference Pipelines with Daft

Daft makes it easy to express these pipelines end-to-end, while seamlessly scaling them up to handle massive workloads.

Embedding Millions of Text Documents With Qwen3
Engineering
Video
August 13, 2025

Embedding Millions of Text Documents With Qwen3

Learn how to achieve near-100% GPU utilization processing millions of text documents with Qwen3 embeddings.

Processing 300K Images Without OOM
Engineering
August 6, 2025

Processing 300K Images Without OOM

A Streaming Solution

We cloned over 15,000 repos to find the best developers
Engineering
April 22, 2025

We cloned over 15,000 repos to find the best developers

An adventure in AI and data engineering to analyze developers across Github

High-Performance File System Support With DeepSeek 3FS
Engineering
March 18, 2025

High-Performance File System Support With DeepSeek 3FS

Learn how Daft integrates with DeepSeek SmallPond 3FS to deliver faster file access and efficient data handling for modern workloads.

Introducing Daft-SQL for High-Performance Data Exploration
Engineering
October 23, 2024

Introducing Daft-SQL for High-Performance Data Exploration

A SQL API enabling users to interact with their data in a new but familiar way. Learn how Daft-SQL brings fast, scalable querying to multimodal workloads, helping teams explore large datasets efficiently with a distributed engine.

Reading Delta Lake with Daft
Engineering
April 10, 2024

Reading Delta Lake with Daft

Discover how Daft reads Delta Lake tables efficiently, giving teams fast access to large datasets and seamless integration into data workflows.

Adversarial file reading: from 10,000 small CSVs to massive Parquet files
Engineering
March 6, 2024

Adversarial file reading: from 10,000 small CSVs to massive Parquet files

Learn how adversarial file reading speeds up data ingestion at scale, enabling fast conversion from thousands of CSVs into efficient Parquet files.

Working with the Apache Parquet file format
Engineering
July 12, 2023

Working with the Apache Parquet file format

This guide shows how Apache Parquet boosts read performance, lowers storage use, and supports efficient workflows for large analytical datasets.

PreviousPage 3 of 3Next
Get updates, contribute code, or say hi.
Daft Engineering Blog
Join us as we explore innovative ways to handle multimodal datasets, optimize performance, and simplify your data workflows.
Github Discussions Forums
join
GitHub logo
The Distributed Data Community Slack
join
Slack logo