Welcome to the Daft blog

Join us as we explore innovative ways to handle multimodal datasets, optimize performance, and revolutionize your data workflows.

Announcements
June 24, 2025

Eventual Raises $30M to Build the Future of Data

Engineering
Video
August 13, 2025

Embedding Millions of Text Documents With Qwen3

Near-100% GPU Utilization

Engineering
August 6, 2025

Processing 300K Images Without OOM

A Streaming Solution

Daft Chinese Community launches in partnership with Bytedance
Announcements
July 28, 2025

Multimodal Data Processing Goes Global

Launching Daft Chinese Community in partnership with Bytedance

Engineering
April 22, 2025

We cloned over 15,000 repos to find the best developers

An adventure in AI and data engineering to analyze developers across Github

Engineering
March 18, 2025

DeepSeek smallpond, 3FS and data processing for AI

A closer look beyond the AI hype

Announcements
November 4, 2024

From v0.2 to v0.3: Harder, Better, Faster, Stronger

Join us on the journey from Daft v0.2 to v0.3!

Engineering
October 23, 2024

Introducing Daft-SQL

A SQL API enabling users to interact with their data in a new but familiar way!

Engineering
April 10, 2024

Reading Delta Lake with Daft

Announcing the launch of Daft's Delta Lake read support

Engineering
March 6, 2024

Adversarial file reading: from 10,000 small CSVs to massive Parquet files

How Daft optimizes the reading of real-world data which is often a mix of "many small files" and "few large files"

Announcements
December 13, 2023

Announcing Daft 0.2: 10x faster IO from S3

Reading data from S3 just got 10x faster!

Engineering
July 12, 2023

Working with the Apache Parquet file format

Quick notes written from 200 meters down the Parquet rabbit hole

Announcements
June 6, 2023

Introducing Daft: A High-Performance Distributed Dataframe Library for Multimodal Data

PreviousPage 2 of 2
Get updates, contribute code, or say hi.
Daft Engineering Blog
Join us as we explore innovative ways to handle multimodal datasets, optimize performance, and revolutionize your data workflows.
Github Discussions Forums
join
GitHub logo
The Distributed Data Community Slack
join
Slack logo