
Daft 2025 Year in Review
Daft 2025 Year in Review - Minor Releases, Major Evolution
by Daft TeamAs we close out 2025, we're excited to share what the Daft team has been building. This year marked a major evolution for Daft. From our package rename to the launch of our next-generation distributed engine, we shipped 56 releases and introduced features that fundamentally changed how teams process multimodal data at scale. As we start anther rotation around the sun, we'd like to reflect on the progress made in 2025.
2025 By The Numbers
Community Growth
- •
🌟 5,091 GitHub stars on Eventual-Inc/Daft
- •
🍴 377 forks
- •
👥 132 contributors
- •
💬 386 active discussions in issues
Development Velocity
- •
🚀 56 releases (averaging 4.7 per month!)
- •
📝 1,637 pull requests merged
- •
💻 1,280 commits
- •
📦 18.1 million bytes of code (54.8% Rust, 26.9% Python)
We shipped releases at an incredible pace, with our busiest stretch coming between May and August when we pushed out 37 releases in just four months.
Three Major Milestones
1. The Great Rename: getdaft → daft (May 31)
With the v0.5.0 release, we simplified our package name from getdaft to just daft. No more typing pip install getdaft—now it's simply:
pip install daft
This change reflected our maturity as a project and made it easier for new users to discover and install Daft. The old package remains as a redirect for backward compatibility.
2. Flotilla: Our Next-Gen Distributed Engine (September 4)
Daft release v0.6.0 introduced Flotilla, our new Ray-based distributed engine. This was the culmination of months of work across Q1 and Q2, and it delivered immediate benefits:
- •
Automatic upgrades: Existing users with
DAFT_RUNNER=raygot Flotilla without changing any code - •
Worker affinity with pre-shuffle merge: Smarter data locality for faster joins
- •
UDF filter separation: Improved query optimization for user-defined functions
- •
Better performance: Count pushdown for Parquet and smarter operator scheduling
Flotilla represented a fundamental reimagining of how Daft executes distributed workloads, setting the foundation for everything that came after.
3. v0.7.0: Dynamic Batching and Schema Evolution (December 16)
Our 0.7.0 release brought performance and flexibility improvements:
- •
Dynamic batching per operator: Adaptive batch sizes optimize memory and throughput
- •
Expansion of Providers for AI functions spanning OpenAI, Google, Transformers, and an experimental implementation of native vLLM-prefix-caching for
prompt. - •
Lazy UDF workers: Lower overhead for sporadic UDF execution
This release also included a breaking change, removing Spark Connect support, as we doubled down on our vision for Daft's future.
Core Theme of 2025: AI & Multimodal
2025 was the year Daft became the go-to engine for multimodal AI workloads. We shipped a comprehensive suite of AI functions that make it trivial to embed, classify, and generate from your data:
Embedding Functions
- •
embed_text(): Text embeddings with support for sentence transformers, OpenAI, and more - •
embed_image(): Image embeddings for visual similarity search - •
Automatic dimension detection: No need to manually specify embedding dimensions
- •
LM Studio integration: Run embeddings on your local models
Check out the docs to see how easy it is to add embeddings to your pipelines.
Vision & Classification
- •
classify_image(): Direct image classification in your queries - •
Image support in
prompt(): Pass images directly to multimodal models - •
Multiple image inputs: Process batches of images in a single prompt
LLM Integrations
- •
Google AI provider: Use Gemini and other Google models
- •
Chat completions API: Full chat interface for conversational AI
- •
Improved model API typing: Better IDE support and error messages
- •
Retry-after mechanism: Automatic rate limit handling
New daft.File Types
We introduced daft.File with specialized subclasses:
- •
daft.AudioFile: Process audio with subtype support - •
daft.VideoFile: Handle video data at scale - •
.to_tempfile()method: Seamlessly localize remote files for use-cases like pdf processing.
Faster UDFs: Write Less, Do More
We completely revamped how you write user-defined functions in Daft:
- •
@daft.func: Simple decorator for stateless functions - •
@daft.func.batch: Explicit batching control for vectorized operations - •
@daft.cls: Stateful UDFs that initialize once and run many times - •
Generator support: Stream results for memory-efficient processing
- •
Retryable UDFs: Automatic retry on failure
These improvements made UDFs faster, more flexible, and easier to debug. Combined with our new structured logging and OpenTelemetry integration, you can now profile and optimize UDFs like never before.
What's Next?
This week the team came back to the office after the holiday break, and we're already planning what's next. While we can't share everything yet, here's what you can expect:
- •
Continued performance improvements to Flotilla
- •
More AI integrations and multimodal capabilities
- •
Better developer tools and debugging capabilities
We're incredibly grateful to our community, from the 132 contributors who submitted PRs to everyone who filed issues, asked questions, and shared feedback. You make Daft better every day. Thank you.
Get Started Now
Ready to try Daft? Here's how to get started:
pip install daft
- •
Documentation: https://docs.daft.ai
- •
- •
- •
- •
Open Positions: https://www.eventual.ai/careers
From the entire Daft team, Happy New Year! 🎉
We can't wait to see what you build in 2026.