Join us as we explore innovative ways to handle multimodal datasets, optimize performance, and simplify your data workflows.

Jim Fan argues robotics will follow the exact LLM playbook - and VLAs are already being replaced by World Action Models.

Daft's query dashboard now shows you exactly where time is going. Slow operators light up red, completed nodes turn green, and arrows trace the data flow through your pipeline. No more guessing which

Datamule, Teraflop AI, and Eventual collaborated to release the SEC-EDGAR dataset containing 590 GB of data, spanning 8 million samples and 43 billion tokens from all major filings in the SEC EDGAR database.

Daft v0.7.7 fixes a parquet streaming regression that made aggregations 2-4x slower, adds df.shuffle() for ML data prep, and makes coalesce short-circuit per the SQL spec.

Daft natively reads and writes every major open lake format — Iceberg, Delta Lake, Hudi, and now Apache Paimon. Plus O(1) scalar columns, fingerprint-based plan caching in Swordfish, and production observability.

Row-wise, generator, async, and stateful UDFs — one notebook, one dataset, runnable side by side.

Run GPU models on millions of rows without OOM. Real patterns from ByteDance, Essential AI, and more.

Turn any Python class into a distributed operator. Hold models, connections, and clients across rows with one decorator.

Native Extensions via Stable C ABI, Live Query Dashboard, and 2-5x faster Parquet Reads on Nested Types

Row-wise, async, generator, and batch UDFs in Daft — one decorator, zero boilerplate, local or distributed.