Serverless AI Ingestion.

Ingest and index unstructured/multimodal data for AI search and retrieval. From any source to any destination. Backed by a managed inference platform built to scale with your data volume.

request a demo

                                                                                     
                        ╔════════ daft█ cloud ═════════╗                           
                        ║                              ║                           
                        ║     ┌───────────────────┐    ║                           
                        ║     │ Managed Inference │    ║                           
                        ║     └───────▲─┬─────────┘    ║  
                        ║             │ │              ║   ┏━ ≡ Data Sink  ━━━━━━━━┓
  ┏━ ≡ Data Source ━┓   ║  ┌──────────┴─▼────────────┐ ║   ┃  VectorDB/Postgres    ┃
  ┃  S3/HTTP/Queue  ┃─────►│ Managed Compute Workers ├────►┃  /S3/Parquet/Iceberg  ┃
  ┗━━━━━━━━━━━━━━━━━┛   ║  └─────────────────────────┘ ║   ┗━━━━━━━━━━━━━━━━━━━━━━━┛
                        ║                              ║                           
                        ╚══════════════════════════════╝

Build AI products
Data plumbing and ETL work
Connect to your data storage for managed ingestion, scaling, and reliability. Stop wiring queues and GPUs together and focus instead on your models, prompts, and AI products.
Managed Inference
Carefully tuning tokens/s
Optimized for high-throughput batch inference, not low-latency chat. Run embedding, vision and LLM models; A managed inference service which autoscales with your data volume.
Serverless
Right-sizing GPU clusters
Scale effortlessly with 0 engineering ops. Autoscales workers & models according to data volume, retries and I/O backpressure. Deploy and experiment faster without worrying about infra.
Versioned with Git
Adhoc notebooks and scripts
Ship fresh and clean data that your AI agents can rely on. Test new pipeline versions on real data, then deploy and backfill knowing exactly which version produced which outputs.

Common use cases

AI Agents require fresh, high-quality data indexing for search and retrieval. * When a user uploads a new post, send a HTTP request to a Daft Cloud endpoint * Daft Cloud performs embedding generation, summarization, thumbnail creation etc * Deliver processed data to a vector databases or S3 (for any generated blob artifacts)

                         ╔══════════ daft█ cloud ═══════════╗
                         ║                                  ║
                         ║     ┌─ ░ Managed Inference ──┐   ║
                         ║     │ Embeddings/Summaries   │   ║
                         ║     └─────────▲─┬────────────┘   ║
                         ║               │ │                ║
  ┏━ ≡ Data Source ━━┓   ║  ┌────────────┴─▼─────────────┐  ║   ┏━ ≡ Data Sink  ━━━━━━━━━━━━━━━━━┓
  ┃ HTTP Endpoint    ┃─────►│ File I/O, chunking, image  ├─────►┃  Vectors: TurboPuffer/pgvector ┃
  ┗━━━━━━━━━━━━━━━━━━┛   ║  │ processing, resizing, etc. │  ║   ┃  Summaries/Thumbnails: S3      ┃
                         ║  └────────────────────────────┘  ║   ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
                         ╚══════════════════════════════════╝

How it works

Define your pipeline in Python

Start by defining your data pipeline as a simple Python function using the Daft API.

Functions can take data sources and sinks as inputs! This makes them easily unit-testable in your repo, just like any normal Python function.

You can read files, preprocess data, perform inference using built-in models, and write the results to a sink — all in just a few lines.

def my_pipeline(src: daft.DataFrame, sink: daft.Catalog):
    # Apply preprocessing: file I/O, image decoding, chunking, PDF parsing
    # or arbitrary Python functionality...
    df = src.with_column("text", parse_pdf(src["file"]))
    
    # Call models via the Daft Inference Platform
    df = df.with_column("embeddings", F.embed_text(df["text"]))
    df = df.with_column("label", F.classify_text(df["text"], ["math", "code", "prose"]))

    # Yield outputs and perform streaming writes to your sink
    table = sink.get_table("my_table")
    table.write(df)

Run in Daft Cloud

Now, just like you would with any code you usually write - simply push your code to github. Daft Cloud has permissions to read your git repo, and utilizes git to access your code and version your pipelines.

1. Select the appropriate data sources and data sinks — Daft Cloud will take care of performing auth + ensuring that they are valid before passing it into your Python function for you.

2. Choose how you want to run your pipeline. Your pipeline can run in several different execution modes:

Adhoc: Run manually, useful for one-time or experimental workloads
Scheduled: Cron-like, runs every hour/day
Incremental: Process on new/changed data (e.g. new object created in AWS S3)
Triggered: Runs the pipeline on batches of HTTP calls based on a predetermined batch sizing and runtime SLO

Observe, debug, iterate

Per-run and per-record metrics (throughput, error rates, cost indicators). Daft exposes useful metrics to help you understand your pipelines: how fast is throughput, how much is it costing you and what are the potential bottlenecks to scaling up even further.
Error captures with payload snippets and stack traces. Daft helps you debug your code without failing the entire pipeline. Bugs in your code are surfaced as stack traces with lineage to help you track down the bad code/data.
Safe replays, backfills, and migrations to new model versions. Daft versions your pipelines and makes it easy to run replays, backfills and migrations. This helps you experiment with new models and techniques without messing with your old data.

How Daft Cloud is built

1.
Ingestion & Orchestration Layer
- Connects to data sources: S3, GCS, Supabase Storage, SQL databases, HTTP endpoints
- Executes Python-defined pipelines for chunking, preprocessing, and transformation
- Handles scheduling, autoscaling, backpressure, retries, and observability
- Control plane: Tracks pipeline definitions, offsets, and run status, planning work units and assigning them to workers
- Autoscaling workers: Pull work from a distributed queue and call into the Daft Inference Platform
- Automatic backpressure: Applied based on GPU/CPU capacity, external LLM rate limits, and downstream sink throughput. You never touch a GPU dashboard or rate-limit config
2.
Daft Inference Platform (Managed)
- Batch-first, not chatbot-first: Optimized for high-throughput jobs over large datasets with emphasis on cost and throughput, not single-digit millisecond latency
- Managed hardware: Daft provisions and manages GPU pools with autoscaling based on queue depth and job requirements
- Provider-aware: Integrations with LLM providers (e.g. OpenAI, etc.) with centralized global rate-limiting, request batching, and strategies to minimize 429s and maximize throughput

Models

We are constantly adding models to Daft Cloud as we onboard partners and explore new use-cases. The Daft Cloud team performs recommendations for models based on industry usage as well as throughput-to-performance ratios. We aim to share more specific numbers regarding price, throughput and performance with our users as we perform more benchmarking and testing.

Note that for many data tasks (e.g. labelling, enrichment, grammar correction, summarization etc) it is often not necessary to use the latest and most expensive reasoning models. Instead, it may be far more economical to choose a smaller, more efficient model which will also provide higher throughput for your data pipelines.

Provider	Model	Description
OpenAI	GPT-5 mini	OpenAI's faster, cost-efficient version of GPT-5 for well-defined tasks.
Daft	Qwen3 Embedding 8B	High-performance 8B parameter embedding model optimized for large-scale semantic search and retrieval tasks.

Provider	Model	Description
OpenAI	GPT-5	OpenAI's best model for coding and agentic tasks across domains.
OpenAI	GPT-5 mini	OpenAI's faster, cost-efficient version of GPT-5 for well-defined tasks.
OpenAI	GPT-5 nano	OpenAI's fastest, most cost-efficient version of GPT-5.
OpenAI	GPT-4.1	OpenAI's enhanced GPT-4 with improved instruction following and reasoning capabilities.
OpenAI	GPT-4.1 mini	OpenAI's compact GPT-4.1 optimized for speed and cost-efficiency.
OpenAI	GPT-4.1 nano	OpenAI's lightweight GPT-4.1 for high-volume, cost-sensitive tasks.
Anthropic	Claude Sonnet 4.5	Anthropic's smartest model for complex agents and coding.
Anthropic	Claude Haiku 4.5	Anthropic's fastest model with near-frontier intelligence.
Anthropic	Claude Opus 4.1	Exceptional model for specialized reasoning tasks.

Provider	Model	Description
Daft	Qwen3 Embedding 8B	High-performance 8B parameter embedding model optimized for large-scale semantic search and retrieval tasks.
OpenAI	text-embedding-3-large	Most capable OpenAI embedding model for semantic search, clustering, and recommendations.

FAQ

You'll likely benefit from Daft Cloud if you're:

Shipping AI search / RAG / agentic retrieval on top of existing data systems
Building labelling / enrichment / moderation pipelines over large, evolving datasets
Already using S3/GCS, SQL, TurboPuffer, pgvector, BigTable, etc.
Want to avoid building your own ingestion + inference platform team

Why it exists

We started with a Rust engine (Daft OSS) to make "run a model over 1M documents" not require Spark or a data infra team.

Teams told us they don't want to:

Stand up and maintain GPU clusters
Stitch together vLLM, queues, and schedulers
Manually juggle LLM provider rate limits
Handcraft ingestion glue for every new source and sink

Daft Cloud is that missing layer:

Data in: from S3/GCS/SQL/HTTP
Inference + transform: managed, batch-optimized
Data out: into TurboPuffer, pgvector, S3, BigTable, and whatever else you already use

Pricing is based on model usage and data processing volume. Contact us for early access pricing details.

Sources and sinks

Sources (read from):

Object storage: S3, GCS, Supabase Storage, CloudFlare R2
Databases: Postgres, MySQL, other SQL via SELECT-based readers
HTTP / feeds: HTTP endpoints, JSON APIs, simple event feeds

Sinks (write to):

Vector stores: TurboPuffer, pgvector (Postgres), other VectorDBs
Object storage: S3 / GCS / Supabase Storage / generic S3-compatible
Databases: Postgres, MySQL, BigTable, other analytic/feature stores
Webhooks / HTTP: any JSON-shaped endpoint

Daft Cloud sits in the middle and moves AI-derived vectors and structured outputs between these systems.

Providing fine-tuned model weights for open-sourced models is an upcoming feature in our roadmap. Running completely custom model architectures is also supported, but the Daft Cloud team cannot optimize the performance of these models for you as effectively.

Serverless AI Ingestion.

Build AI products

Managed Inference

Serverless

Versioned with Git