Join us as we explore innovative ways to handle multimodal datasets, optimize performance, and simplify your data workflows.

Learn multimodal embedding techniques for cross-modal search, recommendation systems, and content moderation applications.

Learn multimodal embedding techniques for cross-modal search, recommendation systems, and content moderation applications.

Migrating ETL workloads from Spark means hitting gaps in date arithmetic — functions like `date_add`, `date_diff`, and epoch conversions that Spark users take for granted. Daft v0.7.9 closes that gap

Daft's query dashboard now shows you exactly where time is going. Slow operators light up red, completed nodes turn green, and arrows trace the data flow through your pipeline. No more guessing which

Datamule, Teraflop AI, and Eventual collaborated to release the SEC-EDGAR dataset containing 590 GB of data, spanning 8 million samples and 43 billion tokens from all major filings in the SEC EDGAR database.

Daft v0.7.7 fixes a parquet streaming regression that made aggregations 2-4x slower, adds df.shuffle() for ML data prep, and makes coalesce short-circuit per the SQL spec.

Daft natively reads and writes every major open lake format — Iceberg, Delta Lake, Hudi, and now Apache Paimon. Plus O(1) scalar columns, fingerprint-based plan caching in Swordfish, and production observability.

Row-wise, generator, async, and stateful UDFs — one notebook, one dataset, runnable side by side.

Run GPU models on millions of rows without OOM. Real patterns from ByteDance, Essential AI, and more.

Turn any Python class into a distributed operator. Hold models, connections, and clients across rows with one decorator.