November 14, 2025

Prompting with DataFrames: Massively Parallel LLM Generation is Here

Structure and scale LLM inference with prompt. A new function for Daft dataframes.

by Everett Kleven

TLDR

We're introducing a new way to work with language models at scale. Daft's prompt function brings massively parallel text generation to DataFrames, making it trivial to run LLM operations over thousands or millions of rows with automatic batching and parallelization. Whether you're doing synthetic data generation, knowledge extraction, or batch tool-calling, you can now scale these workloads efficiently without wrestling with API rate limits or building complex batching logic.

`prompt`


1import daft
2from daft.functions import prompt
3
4# read some sample data
5df = daft.from_pydict({
6    "input": [
7        "What sound does a cat make?",
8        "What sound does a dog make?",
9        "What sound does a cow make?",
10        "What does the fox say?",
11    ]
12})
13
14# generate responses using a chat model
15df = df.with_column("response", prompt(daft.col("input"), model="gpt-5.1"))
16
17df.show()

With Daft’s prompt function, prompting becomes a first‑class DataFrame operation. Instead of building bespoke pipelines, you compose expressions: pass text, files, or images as columns; emit structured outputs as nested types; and let Daft take care of automatic batching, parallelization, retries, caching, and provider abstraction. Whether you’re generating synthetic data, extracting knowledge from PDFs, calling tools at scale, or steering reasoning models, you can do it across thousands—or millions—of rows using the same mental model you already use for analytics.

The Problem: Prompt Engineering Doesn't Scale

If you've worked with LLMs in production, you've likely faced these challenges:

•
Manual batching logic - Writing custom code to batch requests, handle rate limits, and retry failed calls
•
Inefficient resource utilization - GPUs sitting idle between batches or during data preprocessing
•
Context management complexity - Trying to pass images, PDFs, and structured data to models without a unified interface
•
Data Accessibility - Curating context you want doesn't just live on your laptop, it also lives in cloud storage buckets, databases, and repositories.
•
Poor cache utilization - Missing opportunities to reuse computation when prompts share common prefixes

These inconveniences translate directly to higher costs and slower iteration cycles. A data scientist shouldn't need to become a distributed systems engineer just to run a model over a dataset.

The Solution: Prompt as a DataFrame Operation

Daft's prompt function treats prompting as a first-class DataFrame operation. This seemingly simple abstraction unlocks powerful capabilities:

1. Automatic Parallelization

Just like how you don't think about parallelizing a SELECT statement in SQL, you shouldn't have to think about parallelizing LLM calls. Daft handles this automatically:


1import daft
2from daft.functions import prompt
3
4# Read any data source
5df = daft.read_parquet("s3://my-bucket/customer-reviews/*.parquet")
6
7# Generate sentiment analysis at scale
8df = df.with_column(
9    "sentiment",
10    prompt(
11        daft.col("review_text"),
12        system_message="Classify the sentiment as positive, negative, or neutral.",
13        model="gpt-5-mini"
14    )
15)
16
17# Daft automatically batches and parallelizes across your cluster
18df.write_parquet("s3://my-bucket/analyzed-reviews/")

2. Native Multimodal Support

Daft is purpose-built for multimodal AI workloads. Pass text, images, and files as a list of expressions with no manual message building required:


1import daft
2from daft.functions import prompt, download
3
4# Load images and text together
5df = daft.from_glob_path("s3://my-bucket/product-images/*.jpg")
6df = df.with_column("image", decode_image(daft.col("path").download()))
7
8# Prompt with both text and images
9df = df.with_column(
10    "description",
11    prompt(
12        messages=[
13            daft.lit("Describe this product image in detail:"),
14            daft.col("image")
15        ],
16        model="gpt-5"
17    )
18)

This works seamlessly with PDFs, Markdown files, HTML, CSV, and any other file type:


1import daft
2from daft.functions import prompt, file
3
4df = daft.from_glob_path("hf://datasets/Eventual-Inc/sample-files/papers/*.pdf")
5df = df.with_column("file", file(daft.col("path")))
6
7df = df.with_column(
8    "summary",
9    prompt(
10        messages=[daft.lit("Summarize this research paper:"), daft.col("file")],
11        model="gpt-5-nano"
12    )
13)

3. Structured Outputs with Native Struct Support

Daft natively converts Pydantic models to DataFrame columns with nested struct types. This makes structured generation feel natural in a vectorized context:


1import daft
2from daft import prompt, unnest 
3from pydantic import BaseModel, Field
4
5class Anime(BaseModel):
6    show: str = Field(description="The name of the anime show")
7    character: str = Field(description="The name of the character")
8    explanation: str = Field(description="Why the character says the quote")
9
10df = daft.from_pydict({
11    "quote": [
12        "I am going to be the king of the pirates!",
13        "I'm going to be the next Hokage!",
14    ]
15})
16
17df = df.with_column(
18    "classification",
19    prompt(
20        daft.col("quote"),
21        system_message="Classify the anime based on the quote.",
22        return_format=Anime,
23        model="gpt-5-nano",
24    )
25).select("quote", unnest(daft.col("classification")))
26
27df.show(format="fancy", max_width=80)

4. Flexible Template Composition

Build dynamic prompts using Daft's format function or user-defined functions:


1from daft.functions import prompt, format
2
3def answer_in_language(language: str, column_name: str) -> daft.Expression:
4    return format(
5        "Answer the following question in {}: {}",
6        daft.lit(language),
7        daft.col(column_name)
8    )
9
10df = daft.from_pydict({
11    "question": [
12        "What is the capital of France?",
13        "Who invented the telephone?",
14    ]
15})
16
17df = df.with_column(
18    "spanish_answer",
19    prompt(
20        answer_in_language("Spanish", "question"),
21        model="gpt-5"
22    )
23)

Or use row-wise @daft.func() user-defined functions for more complex logic:


1@daft.func
2def build_prompt(context: str, question: str, max_words: int) -> str:
3    return f"""
4Context: {context}
5
6Question: {question}
7
8Answer in at most {max_words} words.
9"""
10
11df = df.with_column(
12    "answer",
13    prompt(
14        build_prompt(daft.col("context"), daft.col("question"), daft.lit(50)),
15        model="gpt-5"
16    )
17)

Swap Providers Without Changing Code

One of the most powerful aspects of Daft's AI functions is the provider abstraction. The same prompt call works across OpenAI, local models, vLLM servers, and more, just change the provider:

OpenAI (Default)


1import daft
2from daft.functions import prompt
3
4# OpenAI is the default - just set your API key
5df = df.with_column("result", prompt(daft.col("input"), model="gpt-5"))

OpenAI-Compatible Providers (OpenRouter, etc.)


1import os
2import daft
3from daft.functions import prompt
4
5daft.set_provider(
6    "openai",
7    base_url="https://openrouter.ai/api/v1",
8    api_key=os.environ.get("OPENROUTER_API_KEY")
9)
10
11df = df.with_column(
12    "result",
13    prompt(daft.col("input"), model="nvidia/nemotron-nano-9b-v2:free")
14)

Local Models with LM Studio


1
2df = df.with_column(
3    "result",
4    prompt(
5        daft.col("input"), 
6        model="google/gemma-3-4b", 
7        provider="lm_studio"
8    )
9)

vLLM Online Serving


1daft.set_provider(
2    "openai",
3    api_key="none",
4    base_url="http://localhost:8000/v1"
5)
6
7df = df.with_column(
8    "result",
9    prompt(daft.col("input"), model="google/gemma-3-4b-it")
10)

vLLM Offline Serving w/ Prefix Caching (Beta)

For batch inference workloads, our new vLLM Prefix Caching provider can cut inference time in half:


1df = df.with_column(
2    "result",
3    prompt(
4        daft.col("input"),
5        provider="vllm-prefix-caching",
6        model="Qwen/Qwen-8B"
7    )
8)

This provider implements Dynamic Prefix Bucketing and Streaming-Based Continuous Batching to maximize GPU utilization and cache hit rates.

Advanced Features

Tool Calling at Scale

Use OpenAI's built-in tools like web search, or define custom functions for agentic workflows:


1import daft
2from daft.functions import prompt
3
4df = daft.from_pydict({
5    "query": ["Buy one get one free burritos in SF right now."]
6})
7
8df = df.with_column(
9    "search_results",
10    prompt(
11        daft.col("query"),
12        model="gpt-5",
13        tools=[{"type": "web_search"}]
14    )
15)

Reasoning Models

Control compute allocation with the `reasoning` parameter for GPT-5 and other reasoning-capable models:


1df = df.with_column(
2    "deep_analysis",
3    prompt(
4        daft.col("complex_question"),
5        model="gpt-5.1",
6        reasoning={"effort": "high"},
7    )
8)

Working with Different OpenAI APIs

Daft supports both the new Responses API (default for GPT-5) and the Chat Completions API:


1# Responses API (default for GPT-5)
2df = df.with_column(
3    "result",
4    prompt(
5        daft.col("input"),
6        model="gpt-5",
7        max_output_tokens=200  # Note: max_output_tokens, not max_tokens
8    )
9)
10
11# Chat Completions API (for GPT-4.1 and earlier)
12df = df.with_column(
13    "result",
14    prompt(
15        daft.col("input"),
16        model="gpt-4.1-2025-04-14",
17        max_tokens=200,  # Note: max_tokens
18        temperature=0.7,
19        use_chat_completions=True
20    )
21)

Structured Outputs with vLLM Online Serving

vLLM's guided decoding for classification and regex-constrained generation works seamlessly:


1# Classification with guided choice
2df = df.with_column(
3    "sentiment",
4    prompt(
5        daft.col("review"),
6        model="google/gemma-3-4b-it",
7        extra_body={"structured_outputs": {"choice": ["positive", "negative"]}}
8    )
9)
10
11# Regex-constrained generation
12df = df.with_column(
13    "email",
14    prompt(
15        format("Generate an email for {}", daft.col("name")),
16        model="google/gemma-3-4b-it",
17        extra_body={"structured_outputs": {"regex": r"\w+@\w+\.com\n"}}
18    )
19)

Conclusion

With massively parallel LLM generation now possible thanks to Daft's prompt function, you can scale LLM operations from a single row to millions without changing your code. Whether you're generating synthetic data, building document intelligence pipelines, or running batch inference, Daft provides the foundation for production-grade AI workloads.

The future of prompt engineering is declarative, scalable, and multimodal.

Try it out and let us know what you build!