Ingest and index unstructured/multimodal data for AI search and retrieval. From any source to any destination. Backed by a managed inference platform built to scale with your data volume.
╔════════ daft█ cloud ═════════╗ ║ ║ ║ ┌───────────────────┐ ║ ║ │ Managed Inference │ ║ ║ └───────▲─┬─────────┘ ║ ║ │ │ ║ ┏━ ≡ Data Sink ━━━━━━━━┓ ┏━ ≡ Data Source ━┓ ║ ┌──────────┴─▼────────────┐ ║ ┃ VectorDB/Postgres ┃ ┃ S3/HTTP/Queue ┃─────►│ Managed Compute Workers ├────►┃ /S3/Parquet/Iceberg ┃ ┗━━━━━━━━━━━━━━━━━┛ ║ └─────────────────────────┘ ║ ┗━━━━━━━━━━━━━━━━━━━━━━━┛ ║ ║ ╚══════════════════════════════╝
Data plumbing and ETL work
Connect to your data storage for managed ingestion, scaling, and reliability. Stop wiring queues and GPUs together and focus instead on your models, prompts, and AI products.
Carefully tuning tokens/s
Optimized for high-throughput batch inference, not low-latency chat. Run embedding, vision and LLM models; A managed inference service which autoscales with your data volume.
Right-sizing GPU clusters
Scale effortlessly with 0 engineering ops. Autoscales workers & models according to data volume, retries and I/O backpressure. Deploy and experiment faster without worrying about infra.
Adhoc notebooks and scripts
Ship fresh and clean data that your AI agents can rely on. Test new pipeline versions on real data, then deploy and backfill knowing exactly which version produced which outputs.
╔══════════ daft█ cloud ═══════════╗ ║ ║ ║ ┌─ ░ Managed Inference ──┐ ║ ║ │ Embeddings/Summaries │ ║ ║ └─────────▲─┬────────────┘ ║ ║ │ │ ║ ┏━ ≡ Data Source ━━┓ ║ ┌────────────┴─▼─────────────┐ ║ ┏━ ≡ Data Sink ━━━━━━━━━━━━━━━━━┓ ┃ HTTP Endpoint ┃─────►│ File I/O, chunking, image ├─────►┃ Vectors: TurboPuffer/pgvector ┃ ┗━━━━━━━━━━━━━━━━━━┛ ║ │ processing, resizing, etc. │ ║ ┃ Summaries/Thumbnails: S3 ┃ ║ └────────────────────────────┘ ║ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ╚══════════════════════════════════╝
Start by defining your data pipeline as a simple Python function using the Daft API.
Functions can take data sources and sinks as inputs! This makes them easily unit-testable in your repo, just like any normal Python function.
You can read files, preprocess data, perform inference using built-in models, and write the results to a sink — all in just a few lines.
def my_pipeline(src: daft.DataFrame, sink: daft.Catalog):
# Apply preprocessing: file I/O, image decoding, chunking, PDF parsing
# or arbitrary Python functionality...
df = src.with_column("text", parse_pdf(src["file"]))
# Call models via the Daft Inference Platform
df = df.with_column("embeddings", F.embed_text(df["text"]))
df = df.with_column("label", F.classify_text(df["text"], ["math", "code", "prose"]))
# Yield outputs and perform streaming writes to your sink
table = sink.get_table("my_table")
table.write(df)Now, just like you would with any code you usually write - simply push your code to github. Daft Cloud has permissions to read your git repo, and utilizes git to access your code and version your pipelines.
1. Select the appropriate data sources and data sinks — Daft Cloud will take care of performing auth + ensuring that they are valid before passing it into your Python function for you.
2. Choose how you want to run your pipeline. Your pipeline can run in several different execution modes:
We are constantly adding models to Daft Cloud as we onboard partners and explore new use-cases. The Daft Cloud team performs recommendations for models based on industry usage as well as throughput-to-performance ratios. We aim to share more specific numbers regarding price, throughput and performance with our users as we perform more benchmarking and testing.
Note that for many data tasks (e.g. labelling, enrichment, grammar correction, summarization etc) it is often not necessary to use the latest and most expensive reasoning models. Instead, it may be far more economical to choose a smaller, more efficient model which will also provide higher throughput for your data pipelines.
| Provider | Model | Description |
|---|---|---|
| OpenAI | GPT-5 mini | OpenAI's faster, cost-efficient version of GPT-5 for well-defined tasks. |
| Daft | Qwen3 Embedding 8B | High-performance 8B parameter embedding model optimized for large-scale semantic search and retrieval tasks. |
| Provider | Model | Description |
|---|---|---|
| OpenAI | GPT-5 | OpenAI's best model for coding and agentic tasks across domains. |
| OpenAI | GPT-5 mini | OpenAI's faster, cost-efficient version of GPT-5 for well-defined tasks. |
| OpenAI | GPT-5 nano | OpenAI's fastest, most cost-efficient version of GPT-5. |
| OpenAI | GPT-4.1 | OpenAI's enhanced GPT-4 with improved instruction following and reasoning capabilities. |
| OpenAI | GPT-4.1 mini | OpenAI's compact GPT-4.1 optimized for speed and cost-efficiency. |
| OpenAI | GPT-4.1 nano | OpenAI's lightweight GPT-4.1 for high-volume, cost-sensitive tasks. |
| Anthropic | Claude Sonnet 4.5 | Anthropic's smartest model for complex agents and coding. |
| Anthropic | Claude Haiku 4.5 | Anthropic's fastest model with near-frontier intelligence. |
| Anthropic | Claude Opus 4.1 | Exceptional model for specialized reasoning tasks. |
| Provider | Model | Description |
|---|---|---|
| Daft | Qwen3 Embedding 8B | High-performance 8B parameter embedding model optimized for large-scale semantic search and retrieval tasks. |
| OpenAI | text-embedding-3-large | Most capable OpenAI embedding model for semantic search, clustering, and recommendations. |
You'll likely benefit from Daft Cloud if you're:
We started with a Rust engine (Daft OSS) to make "run a model over 1M documents" not require Spark or a data infra team.
Teams told us they don't want to:
Daft Cloud is that missing layer:
Daft Cloud sits in the middle and moves AI-derived vectors and structured outputs between these systems.