Join us as we explore innovative ways to handle multimodal datasets, optimize performance, and simplify your data workflows.

Datamule, Teraflop AI, and Eventual collaborated to release the SEC-EDGAR dataset containing 590 GB of data, spanning 8 million samples and 43 billion tokens from all major filings in the SEC EDGAR database.

Sourcetable CTO Andy Grosser discusses their data infrastructure choices and why reliability and scale drove their architecture decisions.

How Teraflop AI processed 7 million court documents and 40 million pages spanning 365 years of U.S. caselaw for under a dollar using Daft.

Essential AI leveraged Daft's data engine to process a massive web-scale dataset for large language model (LLM) training.